7,306 Matching Annotations
  1. Sep 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary of the major findings -

      1) The authors used saturation mutagenesis and directed evolution to mutate the highly conserved fusion loop (98 DRGWGNGCGLFGK 110) of the Envelope (E) glycoprotein of Dengue virus (DENV). They created 2 libraries with parallel mutations at amino acids 101, 103, 105-107, and 101-105 respectively. The in vitro transcribed RNA from the two plasmid libraries was electroporated separately into Vero and C6/36 cells and passaged thrice in each of these cells. They successfully recovered a variant N103S/G106L from Library 1 in C6/36 cells, which represented 95% of the sequence population and contained another mutation in E outside the fusion loop (T171A). Library 2 was unsuccessful in either cell type.

      2) The fusion loop mutant virus called D2-FL (N103S/G106L) was created through reverse genetics. Another variant called D2-FLM was also created, which in addition to the fusion loop mutations, also contains a previously published, evolved, and optimized prM-furin cleavage sequence that results in a mature version of the virus (with lower prM content). Both D2-FL and D2-FLM viruses grew comparably to wild type virus in mosquito (C6/36) cells but their infectious titers were 2-2.5 log lower than wild type virus when grown in mammalian (Vero) cells. These viruses were not compromised in thermostability, and the mechanism for attenuation in Vero cells remains unknown.

      4) Next, the authors probed the neutralization of these viruses using a panel of monoclonal antibodies (mAbs) against fusion loop and domain I, II and III of E protein, and against prM protein. As intended, neutralization by fusion loop mAbs was reduced or impaired for both D2-FL and D2-FLM, compared to wild type DENV2. D2-FLM virus was equivalent to wild type with respect to neutralization by domain I, II, and III antibodies tested (except domain II-C10 mAb) suggesting an intact global antigenic landscape of the mutant virion. As expected, D2-FLM was also resistant to neutralization by prM mAbs (D2-FL was not tested in this batch of experiments).

      5) Finally, the authors evaluated neutralization in the context of polyclonal serum from convalescent humans (n=6) and experimentally infected non-human primates (n=9) at different time points (27 total samples). Homotypic sera (DENV2) neutralized D2-FL, D2-FLM, and wild type DENV similarly, suggesting that the contribution of fusion loop and prM epitopes is insignificant in a serotype-specific neutralization response. However, heterotypic sera (DENV4) neutralized D2-FL and D2-FLM less potently than wild type DENV2, especially at later time points, demonstrating the contribution of fusion loop- and prM-specific antibodies to heterotypic neutralization.

      Impact of the study-

      1) The engineered D2-FL and D2-FLM viruses are valuable reagents to probe antibodies targeting the fusion loop and prM in the overall polyclonal response to DENV.

      2) Though more work is needed, these viruses can facilitate the design of a new generation of DENV vaccine that does not elicit fusion loop- and prM-specific antibodies, which are often poorly neutralizing and lead to antibody-dependent enhancement effect (ADE).

      3) This work can be extended to other members of the flavivirus family.

      4) A broader impact of their work is a reminder that conserved amino acids may not always be critical for function and therefore should not be immediately dismissed in substitution/mutagenesis/protein design efforts.

      Evaluating this study in the context of prior literature -

      The authors write "Although the extreme conservation and critical role in entry have led to it being traditionally considered impossible to change the fusion loop, we successfully tested the hypothesis that massively parallel directed evolution could produce viable DENV fusion-loop mutants that were still capable of fusion and entry, while altering the antigenic footprint."

      ".....Previously, a single study on WNV successfully generated a viable virus with a single mutation at the fusion loop, although it severely attenuated neurovirulence. Otherwise, it has not been generated in DENV or other mosquito-borne flaviviruses"

      The above claims are a bit overstated. In the context of other flaviviruses:

      • A previous study applied a similar saturation mutagenesis approach to the full length E protein of Zika virus and found that while the conserved fusion loop was mutationally constrained, some mutations, including at amino acid residue 106 were tolerated (PMID 31511387).

      • The Japanese encephalitis virus (JEV) SA14-14-2 live vaccine strain contains a L107F mutation in the fusion loop (in addition to other changes elsewhere in the genome) relative to the parental JEV SA14 strain (PMID: 25855730).

      • For tickborne encephalitis virus (TBEV-DENV4 chimera), H104G/L107F double mutant has been described (PMID: 8331735)

      There have also been previous examples of functionally tolerated mutations within the DENV fusion loop:

      • Goncalvez et al., isolated an escape variant of DENV 2 using chimpanzee Fab 1A5, with a mutation in the fusion loop G106V (PMID: 15542644). G106 is also mutated in D2-FL clone (N103S/G106L) described in the current study.

      • In the context of single-round infectious DENV, mutation at site 102 within the fusion loop has been shown to retain infectivity (PMID 31820734).

      We thank the reviewer for these comments. We have adjusted the text above to better reflect and credit the prior literature. Text is modified as follows in the discussion session.

      “Previous reported mutations in the fusion loop are mainly derived from experimental evolution using FL-Ab to select for escape mutant or by deep mutational scanning (DMS) of the Env protein for Ab epitope mapping. Mutations in the FL epitope were observed in a DENV2-NGC-V2 (G106V)39, attenuated JEV vaccine strain SA14-14-2 (L107F)40, attenuated WNV-NY99 (L107F)41. While most of the mutations, including the double mutations reported here lead to attenuation of the virus. A recent DMS study showed that Zika-G106A has no observable impact on viral fitness42. Interestingly, we also recovered a mutation G106L, suggesting position 106 and 107 might be the most tolerable position for mutation in mosquito borne flavivirus FL. On the other hand, tick borne flavivirus as well as vector only flavivirus show a more diverse FL composition. The inflexibility of mosquito borne flavivirus might be due to the evolution constraint of the virus to switch between mosquito and vertebrate hosts.”

      Appraisal of the results -

      The data largely support the conclusions, but some improvements and extensions can benefit the work.

      1) Line 92-93: "This major variant comprised ~95% of the population, while the next most populous variant comprised only 0.25% (Figure 1C)".

      What is the sequence of the next most abundant variant?

      The sequence of the next most abundant variant has been added to the text.

      2) Lines 94-95: "Residues W101, C105, and L107 were preserved in our final sequence, supporting the structural importance of these residues." L107F is viable in other flaviviruses.

      We acknowledge that the L107F mutation has been described in other flaviviruses, including the tick-borne flaviviruses DTV and POWV. This mutation in JEV is associated with viral attenuation. This sentence is referring to the fact that, in our libraries, we did not recover variants with mutations at these positions, in contrast to D2-FL with variants at N103 and G106, indicating less mutational tolerance. However, we want to re-direct the focus of this manuscript to engineer a viable DENV that is antigenically different in the FL epitope, but not which residue is more tolerance for mutation.

      3) Figure 2c: The FLM sample in the western blot shows hardly any E protein, making E/prM quantitation unreliable.

      The samples used in Figure 2C derive from the growth curve endpoint (Figure 2A), in which there is a 1-log difference in viral titer between D2 and D2-FLM. Equivalent volumes of viral supernatant were loaded in the gel, explaining the reduced intensity of the E band in D2-FLM. The higher exposure on the right shows the E band more clearly for D2-FLM. The Western blot assay comparing prM/E ratio as a measure of maturation state was described and validated in our previous study (Tse et al. 2022, mbio). The methods and figure legend have been updated to include greater detail. The polyclonal E antibody was specifically chosen for this study as our previously used monoclonal antibody targeted the fusion loop. The polyclonal antibody was raised against a fragment of E (AA 1-495) and should have minimal effect by the fusion loop mutations.

      4) Lines 149 -151: "Importantly, D2-FL and D2-FLM were resistant to antibodies targeting the fusion loop. While neutralization by 1M7 is reduced by ~2-logs, no neutralization was observed for 1N5, 1L6, and 4G2 for either variant (Figure 3 A)".

      a) Partial neutralization was observed for 1N5, for D2-FL.

      The text has been updated to more accurately describe the 1N5 neutralization data.

      b) Do these mAbs cover the full spectrum of fusion loop antibodies identified thus far in the field?

      We did not test every known fusion loop antibody that has been described, instead focusing on 1M7, 1N5, 1L6, and 4G2, which were previously described by Smith et al and Crill et al. We also modified the text in discussion to reflect the possibility of other FL-Ab that are not affected by out mutations.

      “We have tested a panel of FL-Ab; however, we cannot exclude the possibility that other FL-Abs may not be affected by N103S and G106L. However, we have shown that saturation mutagenesis could generate mutants with multiple amino acid changes, and we are currently using D2-FLM as backbone to iteratively evolve additional mutations in FL to further deviate the FL antigenic epitope.”

      c) Are the epitopes known for these mAbs? It would be useful to discuss how the epitope of 1M7 differs from the other mAbs? What are the critical residues?

      Critical residues for these antibodies have been described. They are as follows: 1M7: W101R, W101C, G111R; 1N5: W101R, L107P, L107R, G111R; 1L6: G100A, W101A, F108A; 4G2: G104H, G106Q, L107K. The critical residues for 1M7 are slightly different than the others, perhaps explaining the residual binding to D2-FL. Note that the critical residue identified previously for 1M7 and 1N5 do not overlap with D2-FLM mutations, suggesting the FL mutations has extending effect on the antigenic FL epitope.

      d) Maybe the D2-FL mutant can be further evolved with selection pressure with fusion loop mAbs 1M7 +/-1N5 and/or other fusion loop mAbs.

      We agree that it may be possible to further evolve D2-FL using antibody selection, although we have not yet performed these experiments, we are currently performing iterative saturation mutagenesis and directed evolution to further evolve away from the natural FL.

      5) It would have been useful to include D2-M for comparison (with evolved furin cleavage sequence but no fusion loop mutations).

      Neutralization data for some of the mAbs against D2-M can be found in our previous study (Tse et al. 2022 mBio), in which no difference in neutralization was observed compared to DV2 wildtype. Given the limited resources of the anti-DENV NHP and human serum, we did not add D2-M for comparison. Although some insight can be deduced from the D2-FL vs D2-FLM comparison, we agree future studies that are designed to delineate CR-Ab population between prM, FL and other CR-epitopes should include D2-M for comparison.

      6) Data for polyclonal serum can be better discussed. Table 1 is not discussed much in the text. For the R1160-90dpi-DENV4 sample, D2-FL and D2-FLM are neutralized better than wild type DENV2? The authors' interpretation in lines 181-182 is inconsistent with the data presented in Figure 3C, which suggests that over time, there is INCREASED (not waning) dependence on FL- and prM-specific antibodies for heterotypic neutralization.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50.

      In general, our human convalescent sera from heterotypic infection (DENV1, 3 and 4) showed none to low neutralization against our DENV2. FRNT50s were between 1: 40 – 1:200. Given the weak potency of the antiserum, it is difficult to compare the FRNT50s between DV2-WT and D2-FLM.

      Similarly, in a different NHP cohort (2nd NHP cohort shown in Table 1), only one DENV4 infected NHP (R1160) showed a low heterotypic titer against DENV2. The detectable FRNT50s were between 1: 50 – 1:90. The value was extrapolated based on a single data point (1:40) which has above 50% neutralization. Given the Hill slope of all the neutralization curves were below 0.5, the FRNT50 values is should not be

      In conclusion, we do not think serum from Table 1 is potent enough to shows difference between the viruses. The intension to show the negative data in Table 1 is to highlight the difference in serum heterogeneity in DENV infected patients and experimental infected NHPs.

      As the reviewer pointed out, the dependence of FL-Ab in later time points increased (the difference between DV2 and D2-FL at 20dpi vs 60dpi vs 90dpi), suggesting non-FL CR-Ab is waning but not prM- and FL-Abs. We rewrote the sentence as follow:

      “These data suggest that after a single infection, many of the CR Ab responses target prM and the FL and the reliance on these Abs for heterotypic neutralization increase overtime (Figure 3C).”

      Suggestions for further experiments-

      1) It would be interesting to see the phenotype of single mutants N103S and G106L, relative to double mutant N103S/G106L (D2-FL).

      2) The fusion capability of these viruses can be gauged using liposome fusion assay under different pH conditions and different lipids.

      3) Correlative antibody binding vs neutralization data would be useful.

      We thank the reviewer for the suggestions; we agree these would be of interest and, indeed, these studies are currently underway. In regard to single mutants, these were present in the initial plasmid library but did not enrich after viral production and passage. Two possible explanations can be drawn, 1) The stochastic of directed evolution prevents a single mutant with similar fitness to enriched. 2) The two mutations are compensatory to each other to make a functional mutant. The 2nd hypothesis highlights the difference between saturation mutagenesis (this study) and DMS (in previous studies).

      Fusion capability is indeed very interesting, however, the mechanistic difference or not between wildtype FL and the mutated FL in supporting fusion is not the focus of this study. Instead, we are currently working on adapting the D2-FLM in mammalian cells. If successful, the difference in fusion mechanism between the Vero adapted and D2-FLM in different lipid, insect vs mammalian would be of interest.

      We are currently developing whole virus ELISA; we avoid using rE monomer for the study as it might neglect the conformation Ab.

      Reviewer #2 (Public Review):

      Antibody-dependent enhancement (ADE) of Dengue is largely driven by cross-reactive antibodies that target the DENV fusion loop or pre-membrane protein. Screening polyclonal sera for antibodies that bind to these cross-reactive epitopes could increase the successful implementation of a safe DENV vaccine that does not lead to ADE. However, there are few reliable tools to rapidly assess the polyclonal sera for epitope targets and ADE potential. Here the authors develop a live viral tool to rapidly screen polyclonal sera for binding to fusion loop and pre-membrane epitopes. The authors performed a deep mutational scan for viable viruses with mutations in the fusion loop (FL). The authors identified two mutations functionally tolerable in insect C6/36 cells, but lead to defective replication in mammalian Vero cells. These mutant viruses, D2-FL and D2-FLM, were tested for epitope presentation with a panel of monoclonal antibodies and polyclonal sera. The D2-FL and D2-FLM viruses were not neutralized by FL-specific monoclonal antibodies demonstrating that the FL epitope has been ablated. However, neutralization data with polyclonal sera is contradictory to the claim that cross-reactive antibody responses targeting the pre-membrane and the FL epitopes wane over time.

      Overall, the central conclusion that the engineered viruses can predict epitopes targeted by antibodies is supported by the data and the D2-FL and D2-FLM viruses represent a valuable tool to the DENV research community.

      Reviewer #1 (Recommendations For The Authors):

      1) Line 51-52: "Currently, there is a single approved DENV vaccine, Dengvaxia." Line 56-57: "Other DENV vaccines have been tested or are currently undergoing clinical trial, but thus far none have been approved for use."

      It should be specified for the global audience that this applies to the United States. Takeda's DENV vaccine, QDENGA is approved in Indonesia, European Union, and Brazil.

      The text has been modified to include this information.

      2) Line 62-63: - "The core fusion loop-motif DRGWGNGCGLFGK is highly conserved..." Lines 78-80: - We generated two different saturation mutagenesis libraries, each with 5 randomized amino acids: DRGXGXGXXXFGK (Library 1) and 79 DRGXXXXXGLFGK (Library 2).

      It may be useful for the readers if the amino acid numbers are stated. The core fusion loop motif DRGWGNGCGLFGK (Eaa98-110) is highly conserved. We generated two different saturation mutagenesis libraries, each with 5 randomized amino acids: DRGXGXGXXXFGK (Library 1; Xaa 101,103, 105-7) and DRGXXXXXGLFGK (Library 2; Xaa 101-105).

      This information has been added to the text.

      3) Line 91-92: "Bulk Sanger sequencing revealed an additional Env-91 T171A mutation outside of the fusion-loop region."

      It looks like the mutation T171A is in domain I of the E protein and does not seem to interface with the fusion loop. Is that why it wasn't pursued further?

      The E171A mutation was included in the infectious clone for D2-FL and D2-FLM. The text has been modified to clarify this inclusion.

      4) Lines 82-85: "Saturation mutagenesis plasmid libraries were used to produce viral libraries in either C6/36 (Aedes albopictus mosquito) or Vero 81 (African green monkey) cells and passaged three times in their respective cell types."

      a) What was the size of the libraries? How does one make sure that the experimental library actually has all the amino acid combinations that were intended?

      Each library has 5 randomized amino acids, so there are 205 = 3.2 million combinations. In these experiments, sequencing of the plasmid libraries revealed about 2 million unique amino acid sequences, or approximately 62.5% library coverage. The actual plasmid diversity is expected to be higher than 2 million as our deep sequencing has limited coverage.

      b) The wild type sequence was excluded from the libraries, correct?

      The wild-type sequence was not specifically excluded from the libraries, as there is no easy method to do so. Wild-type sequence was detected in the plasmid libraries but was not selected in the C6/36 library. However, in the Vero library, we recovered WT virus.

      5) Table 1: - Please include in the table description, what the colors indicate.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50 and removed the unnecessary color code. We also added all relevant information in the table legend.

      6) Lines 246-248: "Previously, a single study on WNV successfully generated a viable virus with a single mutation at the fusion loop, although it severely attenuated neurovirulence."

      It may be worthwhile to mention the WNV mutation (L107F) as some readers may be curious about where this mutation is relative to the ones described in this study.

      This information has been added to the text. We also included the previously described FL mutations in flaviviruses in the text.

      Reviewer #2 (Recommendations For The Authors):

      Major Critique:

      • There is a disconnect between Fig 2A and 2C. FL and FLM viruses have much lower levels of prM-E expression in the viral supernatants based on the western blot in 2C. Why isn't E being detected in the Western? Is the particle-to-pfu ratio skewed in the mutant viruses? Is it possible that the polyclonal is targeting the cross-reactive prM and FL epitopes, and if so would using a monoclonal antibody targeting a known DIII-epitope (2D22) yield a different western result? Also, the legend and methods for Fig 2C are not clear. What is actually being tested in the Western blot? Were equivalent volumes of the different viral preps used?

      The samples used in Figure 2C derive from the growth curve endpoint (Figure 2A), in which there is a 1-log difference in viral titer between D2 and D2-FLM. Equivalent volumes of viral supernatant were loaded in the gel, explaining the reduced intensity of the E band in D2-FLM. The higher exposure on the right shows the E band more clearly for D2-FLM. The Western blot assay comparing prM/E ratio as a measure of maturation state was described and validated in our previous study (Tse et al. 2022, mBio) and the methods have been updated to include greater detail. The polyclonal E antibody was specifically chosen for this study as our previously used monoclonal antibody targeted the fusion loop. The polyclonal antibody was raised against a fragment of E (AA 1-495) and should not be affected by the fusion loop mutations. 2D22 is a conformational antibody and does not work in western blot.

      • Table 1: The data within Table 1 is ignored in the text, and some of this data contradicts the central conclusions of the manuscript.

      o A.) Some of the convalescent data contradicts the hypothesis. DS0275 had an equivalent neut between DV2 and D2-FLM, DS1660, and R1160 (90) had better neut against the D2-FLM than DV2. Discussion of these samples is warranted.

      o C.) The description in the legend does not adequately describe the table. What do the colors represent? What are the numerical values being displayed? What is in parentheses, (I assume the challenge strain)? The limit of detection is reported as 1:40; 0.25. 1:40 is 0.025 which matches most of the data? There is inadequate description of these experiments in the materials and methods.

      We remade Table 1 to show dilution factors instead of dilution factor-1 of FRNT50 and removed the unnecessary color code. We also added discussion for Table 1 and clarify the difference between the three cohorts of serum in the text with the corresponding references.

      In general, our human convalescent sera from heterotypic infection (DENV1, 3 and 4) showed none to low neutralization against our DENV2. FRNT50s were between 1: 40 – 1:200. Given the weak potency of the antiserum, it is difficult to compare the FRNT50s between DV2-WT and D2-FLM.

      Similarly, in a different NHP cohort (2nd NHP cohort shown in Table 1), only one DENV4 infected NHP (R1160) showed a low heterotypic titer against DENV2. The detectable FRNT50s were between 1: 50 – 1:90. The value was extrapolated based on a single data point (1:40) which was above 50% neutralization. Given the Hill slope of all the neutralization curves were below 0.5, the FRNT50 values are not reliable.

      In conclusion, we do not think sera from Table 1 is potent enough to show difference between the viruses. The intension to show the negative data in Table 1 is to highlight the difference in serum heterogeneity in DENV infected patients and experimental infected NHPs.

      Minor critique:

      Figure 1C: Legend is not clear for this panel. What is on the x-axis of the bubble plots? Are these mutations across the entire viral genome or is this just the prM-E sequence?

      The X-axis is a scatter of all of the sequences contained in the library, similar to graphs used for plotting CRISPR screen results. These represent individual sequences from the saturation mutagenesis libraries in the fusion loop of E as described in Figure 1B.

      The wording in Lines 92-94 is not clear. It looks like the T171A mutation was present in 95% of the sequences (Line 92). Yet this sequence was not incorporated into the variant virus. What is the rationale for omitting this mutation in downstream variant virus generation?

      The 95% in Line 92 refers to the variant containing N103S/G106L mutations as seen in Figure 1C. The high-throughput sequencing approach did not include residue 171, so the presence of the T171A mutation in combination with fusion loop mutations cannot be determined. However, the E171A mutation was included in the infectious clone for D2-FL and D2-FLM. The text has been modified to clarify this inclusion.

      The authors discuss the potential of the D2-FL or D2-FLM virus as a potential vaccine platform in the abstract, introduction, and conclusion. This is a good idea, but the authors provide no evidence of feasibility in this manuscript.

      The ultimate goal to engineer a viable DENV with distinct FL antigenic epitope is for it use as live attenuated vaccine. As this is the rationale for the study, we introduce the concept throughout the manuscript. The current study demonstrated the possibility to mutate a novel fusion loop motif in DENV and provided evidence to show the favorable antigenic properties of D2-FLM. We agree with the reviewer that definitive work in animal to show vaccine efficacy need to be done and are currently undergoing. To avoid misleading our audience, we tone down the emphasis of vaccine use in the text.

      Line 150-153: Figure 3A demonstrates that the FL-specific antibodies broadly do not neutralize the mutant viruses. However, the conclusions are overstated in the text. 1N5 neutralizes the D2-FL variant.

      The text has been updated to more accurately describe the 1N5 neutralization data.

      Lines 175-182: The authors make a lot of assumptions about the target of the polyclonal target without any evidence.

      These lines reference studies that showed greater enhancement by antibodies targeting the fusion loop and prM as compared to other cross-reacting antibodies. The assumption that both our manuscript and others have drawn was that Abs that are cross-reactive and weakly neutralizing are more prone for ADE. As discussed, other groups have attempted to mutate the FL from recombinant E protein to achieve similar goal to remove the fusion loop epitope to reduce ADE. We have re-written the sentence in the followings:

      “As FL and prM targeting Abs are the major species demonstrated to cause ADE in vitro, we and others hypothesized these Abs are responsible for ADE-driven negative outcomes after primary infection and vaccination,10–12,32 we propose that genetic ablation of the FL and prM epitopes in vaccine strains will minimize the production of these subclasses of Abs responsible for undesirable vaccine responses. Indeed, covalently locked E-dimers and E-dimers with FL mutations have been engineered as potential subunit vaccines that reduce the availability of the FL, thereby reducing the production of FL Abs.33–36”

    1. Author Response

      Reviewer #2 (Public Review):

      Please note that I am not a structural biologist and cannot critically evaluate the details of figures 1 to 3; my review focuses on the cell biology experiments in figures 4 and 5.

      Paine and colleagues investigated structural requirements for the interaction between the ESCRT-III subunit IST1 and the protease CAPN7. This is a continuation of previous work by the same group (Wenzel et al., eLife 2022), which showed that Capn7 is recruited to the midbody by Ist1 and that Capn7 promotes both normal abscission and NoCut abscission checkpoint function. In this article, the structural determinants of the Ist1-Capn7 interaction are characterised in more detail, focusing on the structure of Capn7 MIT domains and their binding to Ist1. Notably, point mutations in Capn7 MIT domains known to mediate binding to Ist1 and midbody recruitment are shown here to be required for abscission functions, as expected from the authors' previous paper. Furthermore, the report shows that a Capn7 point mutant lacking proteolytic activity behaves as a loss-of-function in abscission assays, despite showing normal midbody localisation. These are important results that will help in future studies to understand how the Capn7 protease regulates abscission mechanistically.

      The report is clearly written and the results support the main conclusions. Some technical limitations and alternative interpretations of the data should be discussed in the text, as outlined below.

      1) It is not always clearly stated how the results presented in this report relate to those in the Wenzel paper. For example, the finding that Ist1 recruits Capn7 to midbodies (p. 6 and figure 4) was first shown in the Wenzel paper. The novelty here is not that Capn7 MIT mutants fail to localise to midbodies, but that they phenocopy the previously described knockdown of Capn7, failing to support normal abscission and NoCut function (fig. 5). This supports and extends the findings of Wenzel et al. It is important to make this explicit and explain the conceptual advances shown here more clearly.

      We take the reviewer’s point and we have now clarified this issue in the text (e.g., page 7, lines 4-5).

      2) The NoCut checkpoint can be triggered by chromatin bridges, DNA replication stress, and nuclear basket defects, but only basket defects are tested here. Therefore, it is not clear if NoCut is still functional in Capn7-defective cells after replication stress and/or with chromatin bridges. Ideally, this should be tested experimentally, or alternatively discussed in the text, especially since the molecular details of how NoCut is engaged under different conditions remain unclear. For example, "abscission checkpoint bodies" proposed to control abscission timing form in response to nuclear basket defects and aphidicolin treatment, but not in the presence of chromatin bridges (Strohacker et al., eLife 2021).

      We appreciate the reviewer’s excellent suggestion. We have now performed the requested experiments and added a new figure showing that CAPN7 is also required to maintain the NoCut checkpoint when it is triggered by DNA bridges (new Figure 6A) or by replication stress (new Figure 6B).

      3) The current data suggest that Capn7 is a regulator of abscission timing, but in my opinion do not quite establish this, for two main reasons. First, abscission timing is not directly measured in this study. Time-lapse imaging would be required to rule out alternative interpretations of the data in figure 5. For example, a delay in an earlier cell cycle stage could in principle lead to a decrease in the overall fraction of midbody-stage cells. Second, the absence of the midbody is not necessarily a marker of complete abscission. Indeed, midbody disassembly is associated with the completion of abscission in unchallenged HeLa cells, but not in cells with chromatin bridges (Steigemann et al, Cell 2009). Midbodies remain a useful marker for pre-abscission cells, but the absence of midbodies should not be immediately interpreted as completion of abscission without further assays. Formally, a direct measurement of abscission timing would require imaging of the plasma membrane, for example using time-lapse phase-contrast microscopy (Fremont et al., 2016 Nat Comm). These limitations should be mentioned in the text.

      We note that midbody numbers are not our only measure of abscission delay/failure - we also measure the numbers of multinucleate cells and sum the two. Nevertheless, we understand the reviewer’s point and have therefore noted that we are using increased frequencies of cells with midbody connections and multiple nuclei as surrogate markers for abscission defects and NoCut-induced abscission delays (page 7, lines 13-14 and line 17).

      4) IST1 plays a role in nuclear envelope sealing by recruiting the co-factor Spastin (Vietri et al., Nature 2015), a known IST1 co-factor also confirmed in the previous interactome screen (Wenzel et al. 2022). CAPN7 could have a role in maintaining nuclear integrity upon the KD of Nup153 and Nup50 (Mackay et al. 2010) instead of/in addition to its proposed role in delaying abscission as part of the NoCut checkpoint at the midbody. I don't think the authors can differentiate between these two possibilities, and it would be interesting to consider their possible implications on how the "NoCut" checkpoint is triggered.

      The reviewer again makes good points, and we agree that in addition to participating in abscission, CAPN7 may be involved in closure of the nuclear envelope and that nuclear envelope closure may, in turn, be linked to satisfaction of the NoCut checkpoint. This involvement would nicely explain our observations that both SPAST and CAPN7 participate in both NoCut and abscission. We are in an unusual situation, however, because other colleagues in our field have told us in private communications that they observe that CAPN7 does, in fact, participate in nuclear envelope closure. We find that observation interesting and exciting but it is their discovery, not ours, and we have therefore refrained from doing analogous experiments ourselves. As a compromise, we have added the following text to the penultimate section of our paper (page 8, lines 34-35 through page 9, lines 1-11):

      “Our discovery that both CAPN7 and SPAST participate in the competing processes of cytokinetic abscission and NoCut delay of abscission may appear counterintuitive, but we envision that the MIT proteins could participate in both processes if they change substrate specificities or activities when participating in NoCut vs. abscission; for example, via different sites of action, post-translational modifications, and/or binding partners. We note that, in addition to its well documented function in clearing spindle microtubules to allow efficient abscission (Yang et al., 2008), SPAST is also required for ESCRT-dependent closure of the nuclear envelope (NE) (Vietri et al., 2015). The relationship between NE closure and NoCut signaling is not yet well understood, and it is therefore conceivable that nuclear membrane integrity is required to allow mitotic errors to sustain NoCut signaling. It will therefore be of interest to determine whether or not CAPN7, in addition to its midbody abscission functions, also participates in nuclear envelope closure and, if so, whether that activity is connected to its NoCut functions.”

      We think that this additional text explains what we (and the reviewer) consider to be an attractive model, but leaves open the question of CAPN7 involvement in nuclear envelope closure to be resolved by our colleagues.

      5) Figure 5 should include images of representative cells, highlighting midbody-positive and multinucleated cells. Without images, it is not possible to evaluate the quality of these data.

      We appreciate this suggestion and have now added images showing midbody-positive and multinucleated cells from the quantified datasets to allow assessment of our data quality (new Figures 5B and 5D).

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Unfortunately, this paper adds only a little to our understanding of uptake in to the flagellar pocket of trypanosomes. It tends to add only detail to information that has been well characterised elsewhere and indeed, as the authors themselves point out, (lines 92-98) it is rather incremental.

      We were disappointed that the reviewer was so unsupportive of the work presented here. It seems possible that the reviewer is partly objecting that the title - which emphasised the main finding of the paper - does not fully capture the content of the paper. We have therefore modified the title to emphasise that the paper is principally a characterisation of TbSmee1 rather than an investigation of the flagellar pocket, with the insight into cargo entry being the most notable finding.

      Not only has Tbsmee1 been studied before but this data in bloodstream forms is not particularly novel since it gives much the same information as the canonical hook protein TbMORN. This work follows the pattern of conclusions made previously with the protein TbMORN. It focusses on the protein TbSmee where RNAi mutants are interpreted to show flagellar pocket enlargement and impaired access by surface bound cargo. Unfortunately, there is little mechanistic or functional conclusion to the study in terms of how TbSmee operates naturally in the cell.

      This is deliberately downplaying the value of the work. TbSmee1 has not previously been characterised in bloodstream form cells, and neither TbMORN1 nor the hook complex are as well-characterised as other cytoskeletal components such as the flagellum and basal body. To criticise the paper for not providing a molecular mechanism of TbSmee1's function is unreasonable given the volume of work provided and the fact that this is a first characterisation of the protein in this life cycle stage. Expectation of a complete molecular mechanism is setting a very high bar for a first characterisation.

      It is also possible that the reviewer has not grasped the main thrust of the argument - when TbMORN1 was characterised it was the first protein shown to have this cargo entry defect. We show here that not only does TbSmee1 share this defect, but that it is in fact a previously-unacknowledged feature of all phenotypes of this type, exemplified by clathrin. We have modified the text to make this finding more clearly emphasised (see for example lines 654-661 in the tracked-changes version of the manuscript).

      There are other possible explanations for the phenotype. That would need to be studied. This large flagellar pocket phenotype is seen with RNAi mutants of many different types of proteins in the trypanosome and so pleiotropic effects are highly likely. Also, there are a good number of alternative possibilities to account for reduced access to the pocket in these mutants and this data could be usefully added.

      This is another statement that seems intended primarily to disparage the paper rather than attempt to improve it. It would have been extremely helpful if the reviewer highlighted what these other possible explanations are instead of making vague allusions. The widespread prevalence of this kind of phenotype means that our insight into restricted cargo access to the flagellar pocket is of general relevance in the trypanosome field.

      Specific points<br /> 1. The transient location for the TbSmee at the FAZ tip - or in this case the groove region - was seen in procyclics (Perry, 2018) so this bloodstream indication merely confirms that concept.

      The reviewer is again downplaying the value of the work rather than providing constructive criticism. While FLAM3 has been shown to be at the tip of the new flagellum in bloodstream form cells (Sunter et al., 2015), at the time of the preprint being published Smee1 was actually the first protein (besides the DOT1 antigen) shown to localise to the groove region in bloodstream form cells. It is also worth noting that procyclic form cells and bloodstream form cells are fairly different in this regard - in procyclic cells, there is an entire flagellar connector structure that is not present in bloodstream form cells, and so demonstrating that Smee1 was present in the groove region was an important experiment. Since this preprint was published, Smithson et al. have identified 13 additional proteins localising to the groove (Smithson et al., 2022) - we have modified the text to include these points (see lines 542-545 of the tracked-changes manuscript).

      1. The C terminal region required for targeting is a reasonable deletion analysis of regions of the protein. But can this data (line 228) be said to "mediate targeting" - or is it just required. For instance, targeting might be OK but it might be needed for stable association, etc etc.

      We have changed the text to say "required" for targeting instead of "mediating" targeting (line 312 of tracked-changes manuscript).

      1. This protein has already been shown to be phosphorylated and the sites and cell cycle possibilities have been mapped by Urbaniak. So that section adds little. https://doi.org/10.1371/journal.ppat.1008129

      The reviewer is again disparaging the significance of the work rather than critiquing it. This is after all only a single panel of a figure and ~15 lines of text, and therefore a minor but still noteworthy element of the manuscript. This also misunderstands what the Urbaniak study does and does not show - while that work showed that Smee1 is phosphorylated, it remained possible that other post-translational modifications were occurring. This experiment shows that the "fuzzy" appearance (variable electrophoretic migration) of TbSmee1 in gels can be solely attributed to phosphorylation as opposed to other post-translational modification. We contacted Dr. Urbaniak to confirm this - his answer is below.

      "__I think your approach to look at the fuzzy banding is actually rather elegant; our data shows that phosphorylation occurs but we did not look for any other PTMs that could influence migration on a gel and probably wouldn't see them without a different enrichment and analysis method. We often see a fuzzy pattern with glycosylation due to the heterogeneity, and I suspect other modifications will also results in a smear. Given that the band collapses to a single band after phosphatase treatment and not with an inhibitor present it is fair to conclude that phosphorylation is responsible for the fuzzy band, not other undefined PTMs like glycosylation.__"

      1. Essentiality in BS forms and pocket enlargement. This is not surprising. A very large number of cytoskeletal proteins show this in RNAi knockdown. Flagella mutants (extensive publications from many groups (Hill, Bastin, Gull, etc) over last 15 years show this very well and so this protein is just one more example.

      This appears to be another comment aimed at downplaying the value of the manuscript rather than providing constructive feedback. The fact that we have demonstrated something previously unobserved in a common phenotype makes the data of general interest to the community, we feel.

      1. I didn't find that the explanations for flagella pocket enlargement are soundly based. The experiments focus on endocytosis and uptake and ignore other plausible reasons and some evidence in literature.

      Again, the reviewer's feedback would be considerably more constructive if they had taken the time to specifically cite the evidence in the literature that they are alluding to, and present some of the "other plausible reasons" they are aware of. We have consulted widely in the community and have not been able to find anybody who knew what work the reviewer is referring to here.

      Lines 84/85. Enlarged pockets may be indicative of endocytosis failure. Presumably the rationale is that endocytosis fails, but exocytosis still occurs and the pocket membrane enlarges. What evidence is there that exocytosis of membrane still occurs? This simple concept might indeed operate in a clathrin mutant but is surface membrane/content exocytosis is maintained in these cytoskeleton mutants? There is good evidence for glycoconjugates within the flagellar pocket. Are these depleted or present still?

      The reviewer is correct that we have not specifically assayed for exocytosis, but the fact that we are able to make the same observations in both the clathrin RNAi (where exocytosis has been assayed - Allen et al., 2003) and the Smee1 RNAi means that this is not a problematic omission. The effect of the enlarged flagellar pocket phenotype on the glycoconjugates in the flagellar pocket is an interesting question but far outside the current focus of the paper.

      1. There are also a number of other publications indicating that clathrin pits are still present on the enlarged pockets of various mutants when viewed by EM. The authors have looked at the flagellar pockets by EM but the EM methods described have extensive washings and centrifugations before fixation. This is a very poor approach and will mean that endo and exocytic traffic is disturbed (extensive references in literature in other systems? This is not a useful approach for exo/endocytosos studies where flux of traffic demands fast chemical or freezing fix in media.

      The reviewer has misunderstood the aim of the experiments described in Figure 5D, which was to observe the morphological changes caused by depletion of TbSmee1. As the reviewer is no doubt aware, high-pressure freezing of trypanosomes gives much better morphological preservation than chemical fixing in media, so the choice of method is not "very poor" but tailored to the experimental aims. We have modified the text to make this point more clearly (lines 355-358 of tracked-changes version). Once again, the referee offers no citation to back up their assertion that endo- and exocytic traffic is disturbed by wash steps, either in trypanosomes or elsewhere.

      1. The EMs and Light microscopy does show that the mutant pockets are substantially abnormal in their cytoskeletal arrangement. They have multiple flagella profiles, flagella structures have not connected with the membrane and are sometimes in the cytoplasm (see a glance of the paraflagellar rod in the cytoplasm in FigS5C and internalised FAZ attachment plaques in Fig 4 D bottom right cell). Given these extensive (and expected) cytoskeletal abnormalities it is highly likely that these pocket abnormalities are a result of motility, cell division/developmental issues and the differential uptake phenotypes merely consequential.

      This is another misinformed argument that is seeking to disparage the data. The reviewer has apparently overlooked the fact that the same phenotype is seen in clathrin RNAi, when flagellar pocket enlargement precedes any downstream effects on cell division cycle progression. We have gone to great lengths (Fig 6) to demonstrate that the enlargement of the flagellar pocket almost certainly precedes the onset of the growth defect in the TbSmee1 RNAi, and it is therefore likely to precede the cytoskeletal abnormalities that the reviewer has highlighted. An effect on cellular motility is possible and would be interesting to investigate in future work.

      1. The authors speak about early phenotypes , but these are often at 15-24 hours. That is probably a couple of cell cycles and so not early.

      To be informative, the analyses of RNAi phenotypes have to be done as soon as possible after the onset of the growth defect, and we have gone to great lengths (Figure 5) to define this point as being at 21 hours. This is already difficult as the number of phenotypic cells at the onset of the growth defect will not be high. We have clarified the text to emphasise that "early" refers to soon after the onset of the phenotype (lines 388-389 of tracked-changes version).

      In relation to the above question of comparison to the same morphology produced by flagella mutants it would be good to know if these hook mutants produce motility phenotypes and whether these are manifest before the uptake phenotypes. There is evidence (cited here) that forward motility of the trypanosome directs material on surface into the pocket. If these cells have motility defects (primary or via failed division) then surely that would provide an alternative simple explanation for uptake differences.

      The reviewer is overlooking the observation that the surface-bound endocytic cargoes (ConA, BSA) are still being sorted/directed as far as the entrance to the flagellar pocket - what is interesting is that the cargo is apparently unable to enter the flagellar pocket. As noted above, it would certainly be interesting to look at motility effects in follow-up work.

      1. There is a general point that if studies are to have real relevance to uptake in the trypanosome then they need to deal with uptake of natural ligands rather than artificial surrogates such as dextran. Such tracers were used historically, but in the last decade a series of receptors and ligands for fluid phase and particularly membrane mediated endocytosis have been discovered. With the investment of a little time these important ligand / receptors such as haptoglobin, transferrin, etc would be much more relevant.

      Dextran is still state-of-the-art as it is an inert fluid phase marker. We are not aware - and have asked widely - of any readily-available alternative to dextran as a fluid phase marker, especially seeing as we have demonstrated in this study that BSA does not behave as a fluid phase marker in the experimental conditions used. The reviewer is also being disingenuous in suggesting that there is a panel of validated physiological reporters for trypanosomes that are readily available commercially - this is not the case. Transferrin is probably the only example, but the transferrin receptor is confined to the flagellar pocket and therefore not relevant to the question of how surface-bound material enters the flagellar pocket in the first place. As suggested by Reviewer 3 and endorsed by Reviewer 2, we have looked at the uptake of anti-VSG antibodies (which are a physiological cargo) in additional experiments and obtained evidence that the same effects are seen (Figure 9).

      **Referees cross-commenting

      this session includes comments from Reviewer 1 and Reviewer 2.<br /> *

      Reviewer 2<br /> <br /> Dear Reviewers 1 and 3:<br /> I agree with many of the points with Reviewer 1 and our divergence is partly a matter of degree. While it is true that this manuscript is incremental in its contribution to our understanding of TbSmee1, it nonetheless adds to our understanding of the role of this protein in the bloodstream life stage and because of that I find value in the work. The fact that it mirrors what was seem in other protein knockdown studies (e.g. TbMORN) doesn't negate its contribution for me. Reviewer 1 makes an important point, however, when stating that this work does not add a mechanistic or functional conclusion as to how TbSmee1 operates and for me that is the biggest shortcoming of the work. Offering mechanistic insight is a high bar and while it would make for a much more exciting story it does not discount the value of the work as presented. What I do appreciate is the speculation about this observation that endocytosis is required for entrance of surface bound material into the pocket and although they are unable to show that this is not a side affect of other processes being disrupted it is and intriguing point. These observation have the potential of stimulating further investigations into crosstalk between the entrance to the pocket and endocytosis. I also agree that the use of ligands for known receptors like transferrin would be far more informative. While I assumed the transferrin receptor was in the pocket itself it would be interesting to see if the ESAG6/7 is also located outside the pocket and transiently binds cargo before being brought inside for endocytosis.<br /> I think that Reviewer 3 brings up a great point with the focus on VSG's. I think that examining VSG turnover in these mutants can add value to the analysis and inform our view of how affecting the hook complex alters VSG endocytosis.

      We appreciate Reviewer 2 taking the time to defend the value of the work, and we concur with Reviewer 2's assessment. Reviewer 2 is also correct that the transferrin receptor appears to be primarily or wholly confined to the flagellar pocket interior, making this likely less informative in this context. Concerning the uptake of anti-VSG antibodies highlighted by Reviewer 3 and endorsed by Reviewer 2, we have carried out these experiments and obtained similar results to those published in the first version of the preprint (Figure 9).

      Reviewer 1<br /> <br /> some fair comment and agreement. This is being sent to general cell biology journals.<br /> when one looks at this area in the round it is it is nearly 50 years (1975) since Langreth and Balber published their seminal work on protein uptake and digestion in bloodstream and culture forms of T. brucei. There has been 50 years intense study and the genome has been around for nearly 20 years as well. So, put simply - for both a general science audience and the wider parasite community - if this is a paper about one protein, TbSmee1,then it has surely has to say something functional about that protein. If it is a paper about uptake in trypanosomes (where mutants are one means of interrogation) then it surely has to say something about mechanisms of uptake of physiological relevant ligands. The days of dextran etc are past.

      Hence, my comment that this does neither and so is very incremental to what is known already. It is 2022 not 1975. Langreth and Baber published their seminal work in J Protozoology for very good reasons no doubt.

      It is striking that Reviewer 1 here extends their aggressive and uncivil approach to attack Reviewer 2's assessment, again substituting forceful wording for informed argument. Reviewer 1 again inexplicably and mistakenly criticises the use of dextran when no state-of-the-art alternative exists. They then go on to needlessly disparage the work done by Langreth & Balber when this work was produced in a totally different publishing landscape. They also appear to fundamentally misunderstand the Review Commons concept, which is to provide journal-independent preprint peer review; it is also worth noting that there are specialist journals such as PLoS Pathogens in the RevComm affiliates as well as general cell biology journals. Given that the mechanism of variant surface glycoprotein (VSG) switching has not yet been fully articulated despite the efforts of multiple labs and many projects over a decades-long time period, it seems extremely unreasonable to be making such demands of this paper.

      Reviewer 2<br /> Thank you for replying and I agree with the spirit of your critique. My only comment, which could result from my own naivete, is to say that despite the incredible work that has been done in dissecting endocytosis in T. brucei over these past 50 years, it appears that we still do not understand how many fundamental of aspects of this activity works in this parasite. Even basic questions regarding how cargo, e.g. transferrin, binding to surface receptors is sensed by the parasite remains unknown and the identity of the specific signaling components which transmit this information internally to initiate endocytosis have not been characterized. In many ways it seems that we don't even understand how the parasite partitions the end/exocytic pathways in the pocket and maintains membrane homeostasis. While we know that some kinases and traditional signaling components must be involved, a high resolution understanding of this process in T. brucei seems lacking. I only say all this to suggest that the field maybe isn't yet that advanced to reject work of this type as so many mechanistic unknowns still remain to be uncovered and maybe incremental advances and phenomenology still can add value to the field. However, I respect your opinion on the matter and my perspective could be due to a lack of a full appreciation of the literature on the subject.

      We completely agree with Reviewer 2's assessment here, which neatly summarises our rationale for the present work. Reviewer 2 is, if anything, being overly accommodating by suggesting that their perspective may be due to a lack of a full appreciation of the literature - on the contrary, Reviewer 2 appears to have a very sound grasp of the topic.

      Reviewer #1 (Significance):

      Unfortunately, I did not find tis to be very significant. It covers old ground in terms of the phenotype described. Many groups have shown the differences between procyclic and bloodstream phenotypes in this enlarged pocket phenomenon. The work is rather incremental from these and other author's work on these hook proteins.<br /> There are alternative explanations for understanding the effect of flagella pocket structure and uptake of ligands into the pocket and trypanosome cell. These would need to be studied before one could see a functional, mechanistic link established.<br /> Other parts of this are of nicely done but do not move on our understanding (eg targeting/phosphorylation) from what has been done previously.

      As noted repeatedly, it appears that Reviewer 1's priority is disparaging the value of the work here and downplaying its significance rather than providing constructive feedback. The reviewer repeatedly makes unrealistic demands (a mechanistic model, use of non-standard reagents), misunderstands the aim of experiments (use of high-pressure freezing), makes vague allusions to other work in the literature but without citing anything specific to support their case, and makes strong and assertive statements that are factually incorrect (design of RNAi experiments, use of dextran). We find this approach unhelpful, uncivil, and unprofessional. It is desperately disappointing that we should have to spend the majority of our response rebutting Reviewer 1's comments rather than implementing constructive criticisms that would strengthen the manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary:<br /> In this manuscript the authors have advanced our understanding of the hook complex component TbSmee1 through a detailed analysis of this protein's role in the endocytosis of surface bound proteins via the flagellar pocket in bloodstream form Trypanosoma brucei. The TbSmee1 protein, previously identified using proximity labeling using TbMORN1 and TbPLK, and characterized in procyclic T. brucei, was confirmed to target to both the shank portion of the hook complex as well as the growing end of the new FAZ in replicating cells. The protein was also shown to likely be phosphorylated as had been suggested previously due to its association with the kinase TbPLK. A domain deletion analysis demonstrated that domains 2 and 3 are important for TbSmee1's proper localization to the hook complex. Loss of TbSmee1 using RNAi based knockdown resulted in a quick cessation of growth in the bloodstream form within 24 hours in contrast to what was seen previously in procyclic cells which had only a decreased growth rate. Loss of TbSmee1 also resulted in an enlargement of the flagellar pocket and in many ways mirrored the phenotype observed with knockdown of TbMORN1. Although prior work on TbSmee1 in procyclic T. brucei demonstrated that loss of this protein altered the morphology of TbMORN1, no such change was seen in bloodstream form cells and only an alteration in the morphology of TbLRRP1 was observed. In characterizing the effect of TbSmee1 depletion on endocytosis the authors showed that the fluid phase marker Dextran could enter into the flagellar pocket of TbSmee1 depleted parasites while the surface bound ConA and BSA remained outside of the flagellar pocket suggesting that TbSmee1 may play a role in allowing larger protein components into the pocket regions. Similar observations were also previously seen with TbMORN1 depletion. Importantly, a knockdown of clathrin recapitulated the TbSmee1 knockdown phenotype suggesting that endocytosis itself was required to allow material bound at the surface to enter into the flagellar pocket. In addition to adding to our understanding of hook complex components, this work raises some interesting questions regarding the role of the hook complex in facilitating endocytosis in this important human pathogen.

      Thank you for the positive assessment.

      Major Critiques:<br /> This is a superbly written manuscript with robust high-quality data that strongly support the major conclusions made by the authors. The flow the article is logical and easy to follow making it accessible to a wide array of readers.

      We are glad that the Reviewer appreciated the effort that went into writing the paper.

      Although I appreciate the brevity of the introduction and how the article gets straight to the point, additional background information on the components and function of the flagellar pocket collar protein could help contextualize the goals of the project. The way in which the flagellar collar structures are introduced to the reader is quite abrupt (beginning on line 75) and simply states the names of TbBILBO1, the centrin arm and hook complex as simple facts without much discussion about the background of these components/regions. A graphical representation of the centrin arm or hook complexes relative to other components like the pocket itself, FAZ or axoneme could make following the story much easier. An expansion of this background could also go a long way to convince readers of the importance of this region in the basic biology and virulence of T. brucei.

      Implemented. We have added more background details on the hook complex, flagellar pocket collar, and centrin arm and added a new schematic image to Figure 1 showing these structures as well as the FAZ (Figure 1A).

      On lines 84-86 the authors cite the way in which 'small' vs 'large' macromolecules enter into the pocket without defining what exactly is meant by these terms as they are relative in nature. Setting some boundaries of size could provide some context to the reader.

      Implemented. We have provided more detail on the approximate sizes in nm (lines 110-113 of tracked-changes manuscript).

      In the domain localization analysis beginning in Figure 4 there is a missed opportunity to also assess which portions of the TbSmee1 protein are important for overall function as well. By either an examination of dominant negative phenotypes resulting from overexpression of the truncated mutant or the expression of the truncated forms designed to be RNAi resistant in the TbSmee1 knockdown cell line, one could also assess which portions of this protein are essential for endocytic function in addition to targeting. Is there a reason this was not performed?

      This is a good point; we did actually investigate overexpression of the TbSmee1(161-766) construct which can target correctly but is missing the first folded domain, but did not observe any phenotypic effects. We have added this point to the results (lines 301-302 of tracked-changes version). We agree that it would be interesting to express the truncations in a TbSmee1 RNAi background in order to simultaneously assay for targeting and function, but this was (unfortunately, perhaps) not part of the original experimental design. To do so now would require generating a completely new panel of truncation constructs with recoded DNA (in order to make them RNAi-resistant) and then generating a new panel of cell lines. While this would be informative, we feel that it would be impractical at present.

      In the analysis of viability changes due to TbSmee1 depletion (lines 237) the authors state that at "72 h post-induction showed widespread lysis, ..." This phenotype seems inconsistent with other related endocytic defect mutants. There is no further mention of this lysis phenomenon here or in the discussion and considering how unique this seems it deserves either additional data to demonstrate or further discussion as to the basis of the phenotype. It seems, at least from this study of TbStarkey1 and prior studies which result in the enlarged flagellar pocket phenotype, that having an enlarged pocket is not the cause of lysis and doesn't even naturally lead to a growth defect.

      Widespread lysis is the usual outcome of bloodstream form cells with strong endocytic defects - we have observed this directly for the clathrin, TbMORN1, and TbSmee1 RNAi cell lines, and it has been documented in a number of other publications (see for example Natesan et al., 2010, Manna et al., 2017). We have clarified this point in the text (see for examples lines 359-341, 474-478 of tracked-changes manuscript).

      The authors do not comment on what is the source for the cessation in growth following TbSmee1 knockdown. Is it nutrient depravation like in other endocytic defect mutants?

      Implemented (see for example lines 359-361, 605-610 of the tracked-changes manuscript). The source of the growth defect is likely to be due to impaired cell division cycle progression due to the gross enlargement of the flagellar pocket and subsequent steric hindrance and imbalance of membrane homeostasis.

      In the end, one of the most interesting observations made by the authors is that loss of TbSmee1 inhibits endocytosis and this has the appearance of not allowing large molecule substrates like ConA and BSA to enter into the flagellar pocket. This appeared to have nothing to do with a gatekeeping type function of the hook complex/flagellar collar and instead, as shown through clathrin knockdown, was related to the ability of the parasite to endocytose. There are a lot of potential interpretations of this phenomenon with one being a simple perturbation of the normal membrane trafficking to and from the flagellar pocket being involved. An analysis of knockdown of exocytic components might reveal whether or not this inability to enter into the pocket is also seen when exocyst proteins are also depleted. It may be impossible to tease apart these two interrelated activities but it might eliminate one side of the equation if these proteins can still enter the flagellar pocket when exocytosis if perturbed although this reviewer understands that that dimension of T. brucei membrane trafficking is poorly understood relative to endocytosis.

      This is an interesting point, and the reviewer is also correct in highlighting that exocytosis is far less characterised than endocytosis in Trypanosoma brucei. The exocyst has been characterised in bloodstream form T. brucei (Boehm et al., 2017) and shown to also have a role in endocytosis, so teasing out the relative contributions of these pathways would undoubtedly be challenging. We would prefer not to go in this direction in this present study, but it is an obvious avenue for future work.

      An intriguing possibility that the authors allude to and which if answered would make this manuscript have a far broader appeal is to determine if loss of TbSmee1 alters the lipid kinase distribution and if this is the source of the negative impact on endocytosis. One important dimension of endocytosis in T. brucei which remains poorly understood is the role of signaling machinery in triggering endocytic events. It is possible that the hook complex serves as the gatekeeping or signaling platform that recruits signaling components (like lipid kinases) that identify and/or modify the membrane lipid phosphatidylinositols harboring cargo laden receptors thus marking them for endocytosis within the pocket. It still seems unclear when in the process of endocytosis is the decision made to pull things into the pocket but it seems that the assumption is that this occurs deep within the pocket. This data suggests that there is possibly another decision point prior to being allowed entrance into the pocket. It may be that this isn't a gatekeeping decision but rather a stop vs. go activity where once cargo laden membrane reaches the collar a choice is made to pull this material in or not there and not after material is already in the pocket.

      These are all really interesting ideas and would be fascinating topics for future work.

      This obvious enigma based on the observation that loss of hook complex components affect the spatially separated site of endocytosis support the idea that the actual endocytic signaling platforms are located at the hook complex and that this area may make the membrane modifications that mark membrane as being ready to be endocytosed via clathin coated vesicles at the bottom of the pocket. This would still allow for fluid phase small molecule entrance which does not require binding to surface proteins. The obvious problems of having both endo/exocytosis occurring in the same close proximity makes the dissection of this phenomenon difficult but it is worth potentially expounding on further in the discussion as this idea is very appealing and adds an important dimension to our understanding of endocytosis in this organism.

      Implemented (lines 722-727 of the tracked-changes manuscript). We have added some more detail to these points in the Discussion. We agree with the reviewer that there are some profoundly interesting questions concerning membrane identify and membrane protein uptake here.

      Minor Critiques:<br /> The authors commit significant time to the analysis of the phosphorylation of TbSmee1, but there is little stated about the role of TbPLK in this activity or the potential connection of TbSmee1 phosphorylation to the cell cycle. Would a knockdown of TbPLK using RNAi potentially demonstrate an altered migration of TbSmee1 due to a lack of phosphorylation? An analysis of radiolabeled TbSmee1 using p32 in vivo would likely support this claim as well. Has mass spectrometry identified potential phosphorylation sites to examine? Additionally, the loss of TbSmee1 has been shown to disrupt localization of TbPLK in procyclic cells and so why this was not also assessed in bloodstream form cells subjected to RNAi was not clear.

      Partly implemented. We have added some discussion of the possible role of TbSmee1 phosphorylation in the cell cycle to the Discussion (lines 562-565 of tracked-changes manuscript), and emphasised the identification of phosphorylation sites in previous phosphoproteomics work (citations of Nett et al., 2009, Urbaniak et al., 2013). Given that the strongest and earliest effect of TbSmee1 depletion was on endocytosis and cargo uptake, we chose to focus on this angle rather than exploring its contribution to the biogenesis of cytoskeleton-associated structures and its interaction with TbPLK. For that reason we would prefer not to carry out the experiments looking at the effects of TbSmee1 depletion on TbPLK or vice versa.

      In the results section (lines 104-108) a model of the protein structure as predicted for example by AlphaFold might be informative and complement the domain analysis work depending on the quality of the prediction.

      Implemented. The AlphaFold prediction is consistent with the predictions made by the other structural analyses, and we have noted this in the text (lines 145-148 and 551 of the tracked-changes version).

      There is an arrow in the Figure 1B Western blot but I can find no mention of what it is trying to highlight in the text.

      Corrected.

      For Figure 1D there is no loading control or control for the distribution of the soluble fraction to validate the separation of the two compartments.

      Implemented. We have carried out additional experiments to show the partitioning of a cytoplasmic protein (the endoplasmic reticulum chaperone BiP) into the detergent-soluble fraction. These results are now displayed in the updated Figure 1.

      The authors fail to comment on the lack of changes in hook complex components they see to that observed by Perry et. al. 2018. This difference merits some minor comment or speculation.

      Implemented. We have added this commentary to the Discussion (lines 592-600 of the tracked-changes version).

      Line 228: domain should be capitalized.

      Implemented.

      Line 230: FigS5C should have a space and period after Fig. and S5C.

      Implemented.

      Line 244: "on" should be inserted in the sentence "...TbSmee1 protein depletion ON either side of the onset..."

      Implemented.

      Line 400: the '...20/21 h post-induction...' is slightly confusing and may read better as 20-21 h.

      Implemented.

      Line 463: a space is needed between '...2009).The...'.

      Implemented.

      Reviewer #2 (Significance):

      This manuscript advances our current conception of endocytosis in T. brucei. Although this model kinetoplastid parasite has been extensively studied with respect to endocytosis there is still a great deal we do not yet understand regarding how this process is regulated at a mechanistic level. This work has begun to connect previously unappreciated aspects of endocytosis in T. brucei by highlighting a potentially novel connection between the flagellar collar/hook complex and the physically separated endocytic events within the flagellar pocket itself. It may be that what appears as regulated entrance into the pocket is in fact the source of signaling that triggers the endocytic events carried out by clathrin. This is an interesting notion that no doubt requires further investigation which lies outside of the scope of this report. While this work appeals primarily to those studying kinetoplastids parasites it has the potential to provide insight into basic protozoan biology as well. Due to my related interest in kinetoplastid endocytosis, I find this work to be of high quality, conceptually interesting and employs many of the cutting-edge techniques currently available in the study of T. brucei.

      We are very happy that the Reviewer formed a favourable impression of the work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This manuscript begins to dissect the function of the hook complex protein SMEE1 in the mammalian infective form of T. brucei. The hook complex is a cytoskeletal structure associated with the flagellar pocket, the only site of endo/exocytosis in these cells. The authors demonstrate that SMEE1 is required for endocytosis in these cells and that this can occur with minimal change to the molecular make-up of the hook complex. The authors show that endocytosis is important for the access of large molecules e.g. ConA into the flagellar pocket.

      Major comments

      The key conclusion of this study are convincing and the data is generally well presented and clear. The interpretation of the figures matches well with the data presented - there are a few minor issues though that I have highlighted below in minor comments. The authors use a range of molecular cell biology approaches to define the role of SMEE1 and these are appropriate and are well controlled.

      Thank you.

      My major comment focuses on the use of different tracers to study endocytosis but the elephant in the room is what is happening to VSG as this is the surface protein that needs to rapidly removed from the cell surface and cleaned. Given the importance of removal of antibodies bound to the VSG - have the authors looked at this in the SMEE1 depleted cells? Do VSG-antibody complexes accumulate in this region? This is an important experiment as this would give key physiologically relevant data to this study. All the material should be readily available for this as there are a number of VSG antibodies.

      We agree with the Reviewer that the behaviour of these VSG-bound antibodies is a key test of the physiological relevance of the observations we have made using ConA and BSA, and have implemented this request - the results are in the new Figure 9. Although they sound simple, these assays turned out to be far from trivial and much more technically challenging than the other uptake assays, owing to the extremely fast kinetics (seconds) of anti-VSG uptake (Engstler et al., 2007) and the unexpectedly and incredibly high losses of bound antibodies during the assay. This might be due to shedding, as noted in the Discussion.

      Minor comments<br /> Perhaps I have been overthinking this but is surface-bound the right way to describe the cargo, as it clearly goes in both directions onto and off the surface and in fact the experiments in this manuscript are focussing on the removal of this material from the surface so is not surface-bound.

      We have clarified that "surface-bound" refers to material that binds to the surface glycoprotein coat of the trypanosomes and which is subsequently internalised, not material that is bound for (i.e being directed to) the cell surface (lines 77-78 of tracked-changes version). We hope this addresses the Reviewer's point?

      Have the authors investigated the structure of the protein using alphafold and if so how does that compare to the domain structure that was presented in this manuscript?

      Implemented (lines 145-148, 551 of tracked-changes version). We have checked the AlphaFold prediction of the three-dimensional structure of TbSmee1 and noted it in the Results; the prediction is consistent with the earlier bioinformatic analyses.

      The authors raised a number of antibodies to TbSMEE1 and TbSTARKEY1 but it was not clear in the figures which antibody was ultimately used for analysis by western and IF - could the authors clarify, as some looked to have a higher background than others. Line 150 states the same localisation was seen for all three antibodies and references S3C but I couldn't see that data presented.

      Implemented - the 304 antisera was used for most subsequent experiments and we have noted this in the M&M (lines 793-798 of tracked-changes version). Figure S3C shows that the Ty1-TbSmee1 recapitulates the localisation of the antibodies against the endogenous protein - we have clarified this point as well (lines 206-207 of tracked-changes version).

      Line 169 - can the authors provide more detail about the global correlation methodology as I was unable to follow the details in the methods? Is this a pixel per pixel correlation over the image or on a selected region over the area of potential signal overlap? In figure 2E it appears that BILBO1 signal correlates more closely with the SMEE1 signal than MORN and LRRP1 and from the images that would not seem to be the case. Have I interpreted this figure incorrectly?

      Implemented. The original analysis was a global correlation analysis that was determining whether the signals were correlated with each other regardless of spatial overlap, and we agree with the reviewer that these outputs were non-intuitive to interpret. In the revision, we have carried out a new analysis (and updated the accompanying text and M&M section), measuring the degree of spatial correlation between each pair of signals on a pixel-by-pixel basis over the area of each cell, with a total of 30 cells analysed in each pairing. We believe that this addresses the reviewer's point. See lines 223-243, 963-974 of the tracked-changes version).

      The authors have generated a number of different clones and performed experiments on these clones generally more than twice, which is clearly explained in the figure legends but in places the data is then put together and it is difficult to know which experiments/clones it comes from - for example 7C/7F what do those percentages represent? Is this the sum of all experiments? A representative experiment? How many cells per experiment were analysed?

      Implemented. We have double-checked all the figure legends and clarified this point where necessary. Quantifications were always made by compiling data from multiple independent experiments using multiple separate clones - see in particular lines 1323-1324, 1363-1365, 1380-1382 of the tracked-changes version.

      Line 200 - From the image it is not convincing that SMEE1 is slightly behind DOT1 - I agree it looks enveloped but would appear level with the distal end of the DOT1 signal.

      Implemented. We have adopted the Reviewer's wording for this text (line 271 of tracked-changes version).

      For the truncation experiments the authors should explain that these are performed with cells in which the endogenous SMEE1 will be expressed and this may influence the localisation of the truncations, especially as there is no information about whether SMEE1 forms complexes with itself or other proteins.

      Implemented (lines 296-298 of tracked-changes version).

      Figure 4D - should be 1 not T-

      We have relabelled this as "TbSmee1". The values in this column are the immunoblot signal intensities obtained for the endogenous TbSmee1 protein in the -Tet condition. We have also clarified this in the figure legend.

      Line 223 - given the low expression of constructs 2 and 9 I'm not sure it is possible to infer anything from the lack of localisation of these constructs as they appear unstable and would be unlikely to localise to a specific location.

      We have added this caveat to the text (lines 558-562 of tracked-changes version).

      Figure S7 - The images presented were not convincing that there was a reduction in the localisation of LRRP1 to the hook complex on depletion of either SMEE1 or MORN1. The difference looks particularly minor if present at all.

      Agreed, there was some debate in the group about these results. We have changed to text to fit the Reviewer's interpretation (lines 347-348 of the tracked-changes version).

      Line 264 - "implied that the lethal phenotype might be due to a loss of function" - this seems an odd thing to say as it doesn't provide any insight as of course the phenotype is due to a loss of function.

      We have clarified this point (lines 350-353 of the tracked-changes version). We would however disagree with the reviewer that RNAi phenotypes are exclusively due to a loss of individual protein's function(s) - when proteins are present in multiprotein complexes (as is often the case with cytoskeleton-associated proteins), then destabilisation of the complex due to loss of the entire protein can cause the observed phenotype, rather than the loss of the function performed by the individual protein within the complex (this may be a semantic point, however). A very good example of this is with the outer arm dynein complex component LC1 (Ralston et al., 2011) - RNAi against LC1 is lethal because the entire outer arm dynein complex is destabilised, whereas expression of non-functional mutants of LC1 produces viable cells with motility defects due to the specific loss of LC1 function.

      Line 412 - can the authors clarify what they mean by geometric problems?

      Implemented (lines 605-610 of tracked-changes version). We were referring to the fact that enlargement of the flagellar pocket will probably create difficulties for the progression of the cell division cycle.

      Throughout the manuscript can you use log scale for the growth curves.

      Implemented.

      Line 756 - add citation

      Whoops! Implemented (line 1058 of tracked-changes version).

      Line 465/66 - the authors states that the ability of the fluid phase cargo being still able to enter the pocket is evidence that the channel lumen is still open; however, I would think that despite the close apposition of the cell membrane to the flagellar membrane in the flagellar pocket neck region this would be unlikely to impede fluid/soluble material from entering the pocket, as presumably VSG protein can move through this region. This does not alter the ultimate conclusion the authors are drawing but without microscopy evidence for the state of the channel lumen it is difficult to be sure of its status.

      Fair point. We have modified this statement (line 701 in tracked-changes version).

      Reviewer #3 (Significance):

      The flagellar pocket is the key portal into and out of the trypanosome cell and as such has a vital role to play in host-parasite interactions. The flagellar pocket is supported by a number of cytoskeletal structures including the hook complex and the role of these structures in flagellar pocket function are poorly understood. The flagellar pocket is particularly important in the bloodstream form of the trypanosome parasite which infects the mammalian host as it is the route for the surface protein VSG to get onto and off the surface. The VSG is required for antigenic variation and the removal of VSG-antibody complexes helps 'clean' the surface of the parasite. SMEE1 is a component of the hook complex and the manuscript here dissects its role in the mammalian infective parasite and shows that it is vital for the endocytosis of material off the surface. Intriguingly, a block in endocytosis causes a blockage of material outside of the pocket, suggesting a multi-step process in the regulation of uptake of material from the parasite's surface.<br /> This manuscript will be of specific interest to those researchers investigating the long-term persistence of these parasites in the mammalian host. There are potentially some insights into the control of membrane domains for endocytosis that are of interest to more general cell biologists as well.

      We are very grateful to the reviewer for the supportive comments and the constructive evaluation. Many thanks!

      Expert in molecular cell biology of trypanosomes and Leishmania.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Unfortunately, this paper adds only a little to our understanding of uptake in to the flagellar pocket of trypanosomes. It tends to add only detail to information that has been well characterised elsewhere and indeed, as the authors themselves point out, (lines 92-98) it is rather incremental. Not only has Tbsmee1 been studied before but this data in bloodstream forms is not particularly novel since it gives much the same information as the canonical hook protein TbMORN.

      This work follows the pattern of conclusions made previously with the protein TbMORN. It focusses on the protein TbSmee where RNAi mutants are interpreted to show flagellar pocket enlargement and impaired access by surface bound cargo. Unfortunately, there is little mechanistic or functional conclusion to the study in terms of how TbSmee operates naturally in the cell. There are other possible explanations for the phenotype. That would need to be studied. This large flagellar pocket phenotype is seen with RNAi mutants of many different types of proteins in the trypanosome and so pleiotropic effects are highly likely.

      Also, there are a good number of alternative possibilities to account for reduced access to the pocket in these mutants and this data could be usefully added.

      Specific points

      1. The transient location for the TbSmee at the FAZ tip - or in this case the groove region - was seen in procyclics (Perry, 2018) so this bloodstream indication merely confirms that concept.
      2. The C terminal region required for targeting is a reasonable deletion analysis of regions of the protein. But can this data (line 228) be said to "mediate targeting" - or is it just required. For instance, targeting might be OK but it might be needed for stable association, etc etc.
      3. This protein has already been shown to be phosphorylated and the sites and cell cycle possibilities have been mapped by Urbaniak. So that section adds little. https://doi.org/10.1371/journal.ppat.1008129
      4. Essentiality in BS forms and pocket enlargement. This is not surprising. A very large number of cytoskeletal proteins show this in RNAi knockdown. Flagella mutants (extensive publications from many groups (Hill, Bastin, Gull, etc) over last 15 years show this very well and so this protein is just one more example.
      5. I didn't find that the explanations for flagella pocket enlargement are soundly based. The experiments focus on endocytosis and uptake and ignore other plausible reasons and some evidence in literature.<br /> Lines 84/85. Enlarged pockets may be indicative of endocytosis failure. Presumably the rationale is that endocytosis fails, but exocytosis still occurs and the pocket membrane enlarges. What evidence is there that exocytosis of membrane still occurs? This simple concept might indeed operate in a clathrin mutant but is surface membrane/content exocytosis is maintained in these cytoskeleton mutants? There is good evidence for glycoconjugates within the flagellar pocket. Are these depleted or present still?
      6. There are also a number of other publications indicating that clathrin pits are still present on the enlarged pockets of various mutants when viewed by EM. The authors have looked at the flagellar pockets by EM but the EM methods described have extensive washings and centrifugations before fixation. This is a very poor approach and will mean that endo and exocytic traffic is disturbed (extensive references in literature in other systems? This is not a useful approach for exo/endocytosos studies where flux of traffic demands fast chemical or freezing fix in media.
      7. The EMs and Light microscopy does show that the mutant pockets are substantially abnormal in their cytoskeletal arrangement. They have multiple flagella profiles, flagella structures have not connected with the membrane and are sometimes in the cytoplasm (see a glance of the paraflagellar rod in the cytoplasm in FigS5C and internalised FAZ attachment plaques in Fig 4 D bottom right cell). Given these extensive (and expected) cytoskeletal abnormalities it is highly likely that these pocket abnormalities are a result of motility, cell division/developmental issues and the differential uptake phenotypes merely consequential.
      8. The authors speak about early phenotypes , but these are often at 15-24 hours. That is probably a couple of cell cycles and so not early. In relation to the above question of comparison to the same morphology produced by flagella mutants it would be good to know if these hook mutants produce motility phenotypes and whether these are manifest before the uptake phenotypes. There is evidence (cited here) that forward motility of the trypanosome directs material on surface into the pocket. If these cells have motility defects (primary or via failed division) then surely that would provide an alternative simple explanation for uptake differences.
      9. There is a general point that if studies are to have real relevance to uptake in the trypanosome then they need to deal with uptake of natural ligands rather than artificial surrogates such as dextran. Such tracers were used historically, but in the last decade a series of receptors and ligands for fluid phase and particularly membrane mediated endocytosis have been discovered. With the investment of a little time these important ligand / receptors such as haptoglobin, transferrin, etc would be much more relevant.

      Referees cross-commenting

      This session includes comments from Reviewer 1 and Reviewer 2.

      Reviewer 2

      Dear Reviewers 1 and 3:<br /> I agree with many of the points with Reviewer 1 and our divergence is partly a matter of degree. While it is true that this manuscript is incremental in its contribution to our understanding of TbSmee1, it nonetheless adds to our understanding of the role of this protein in the bloodstream life stage and because of that I find value in the work. The fact that it mirrors what was seem in other protein knockdown studies (e.g. TbMORN) doesn't negate its contribution for me. Reviewer 1 makes an important point, however, when stating that this work does not add a mechanistic or functional conclusion as to how TbSmee1 operates and for me that is the biggest shortcoming of the work. Offering mechanistic insight is a high bar and while it would make for a much more exciting story it does not discount the value of the work as presented. What I do appreciate is the speculation about this observation that endocytosis is required for entrance of surface bound material into the pocket and although they are unable to show that this is not a side affect of other processes being disrupted it is and intriguing point. These observation have the potential of stimulating further investigations into crosstalk between the entrance to the pocket and endocytosis. I also agree that the use of ligands for known receptors like transferrin would be far more informative. While I assumed the transferrin receptor was in the pocket itself it would be interesting to see if the ESAG6/7 is also located outside the pocket and transiently binds cargo before being brought inside for endocytosis.<br /> I think that Reviewer 3 brings up a great point with the focus on VSG's. I think that examining VSG turnover in these mutants can add value to the analysis and inform our view of how affecting the hook complex alters VSG endocytosis.

      Reviewer 1

      some fair comment and agreement. This is being sent to general cell biology journals.<br /> when one looks at this area in the round it is nearly 50 years (1975) since Langreth and Balber published their seminal work on protein uptake and digestion in bloodstream and culture forms of T. brucei. There has been 50 years intense study and the genome has been around for nearly 20 years as well. So, put simply - for both a general science audience and the wider parasite community - if this is a paper about one protein, TbSmee1,then it has surely has to say something functional about that protein. If it is a paper about uptake in trypanosomes (where mutants are one means of interrogation) then it surely has to say something about mechanisms of uptake of physiological relevant ligands. The days of dextran etc are past. Hence, my comment that this does neither and so is very incremental to what is known already. It is 2022 not 1975. Langreth and Baber published their seminal work in J Protozoology for very good reasons no doubt.

      Reviewer 2<br /> Thank you for replying and I agree with the spirit of your critique. My only comment, which could result from my own naivete, is to say that despite the incredible work that has been done in dissecting endocytosis in T. brucei over these past 50 years, it appears that we still do not understand how many fundamental of aspects of this activity works in this parasite. Even basic questions regarding how cargo, e.g. transferrin, binding to surface receptors is sensed by the parasite remains unknown and the identity of the specific signaling components which transmit this information internally to initiate endocytosis have not been characterized. In many ways it seems that we don't even understand how the parasite partitions the end/exocytic pathways in the pocket and maintains membrane homeostasis. While we know that some kinases and traditional signaling components must be involved, a high resolution understanding of this process in T. brucei seems lacking. I only say all this to suggest that the field maybe isn't yet that advanced to reject work of this type as so many mechanistic unknowns still remain to be uncovered and maybe incremental advances and phenomenology still can add value to the field. However, I respect your opinion on the matter and my perspective could be due to a lack of a full appreciation of the literature on the subject.

      Significance

      Unfortunately, I did not find tis to be very significant. It covers old ground in terms of the phenotype described. Many groups have shown the differences between pro cyclic and bloodstream phenotypes in this enlarged pocket phenomenon. The work is rather incremental from these and other author's work on these hook proteins.

      There are alternative explanations for understanding the effect of flagella pocket structure and uptake of ligands into the pocket and trypanosome cell. These would need to be studied before one could see a functional, mechanistic link established.

      Other parts of this are of nicely done but do not move on our understanding (eg targeting/phosphorylation) from what has been done previously.

    1. AbstractRecent advances in genome-wide association study (GWAS) and sequencing studies have shown that the genetic architecture of complex diseases and traits involves a combination of rare and common genetic variants, distributed throughout the genome. One way to better understand this architecture is to visualize genetic associations across a wide range of allele frequencies. However, there is currently no standardized or consistent graphical representation for effectively illustrating these results.Here we propose a standardized approach for visualizing the effect size of risk variants across the allele frequency spectrum. The proposed plots have a distinctive trumpet shape, with the majority of variants having low frequency and small effects, while a small number of variants have higher frequency and larger effects. These plots, which we call ‘trumpet plots’, can help to provide new and valuable insights into the genetic basis of traits and diseases, and can help prioritize efforts to discover new risk variants. To demonstrate the utility of trumpet plots in illustrating the relationship between the number of variants, their frequency, and the magnitude of their effects in shaping the genetic architecture of complex diseases and traits, we generated trumpet plots for more than one hundred traits in the UK Biobank. To facilitate their broader use, we have developed an R package ‘TrumpetPlots’ and R Shiny application, available at https://juditgg.shinyapps.io/shinytrumpets/, that allows users to explore these results and submit their own data.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.89) and has published the reviews under the same license. These are as follows.

      **Reviewer 1. Clara Albiñana **

      As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?

      No. Although there are no explicit guidelines for contribution in the manuscript or website, it is true that by placing the project on gitlab it is possible to contribute to the project / open issues.

      Is the code executable?

      No. Unfortunately, I wasn't able to install the R package. I have now opened an issue on the gitlab page so that it can hopefully get solved.

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      Yes. It is very common for new R packages to just use devtools for installation.

      Is the documentation provided clear and user friendly?

      Yes. The requirements for generating a trumpet plot just involve providing a set of GWAS summary statistics with column-specific names, together with the GWAS sample size. This is very common for GWAS summary statistics-based tools. I think it is fine for the R package to require re-naming the columns to fit the format, as one already needs to upload the file into R. However, I find it inconvenient to have to re-save the summary statistics file with different name-columns for the shinyapp tool. Providing e.g. column indexes alone would be much more user-friendly.

      Is there enough clear information in the documentation to install, run and test this tool, including information on where to seek help if required?

      No. I cannot answer this question until I can install the tool.

      Have any claims of performance been sufficiently tested and compared to other commonly-used packages?

      Not applicable. There are no existing comparable tools.

      Is automated testing used or are there manual steps described so that the functionality of the software can be verified?

      Yes. I can see there is a toy dataset included with the R package.

      Additional Comments:

      I think the manuscript is very clear and good at making the point of the utility of the software. The proposed trumpet plots are very visually appealing and can be useful to characterise the genetic variation of diverse phenotypes. The novelty of the trumpet plots, as compared to previously proposed effect size vs. allele frequency plots, is the use of positive and negative effect sizes, making it look like a trumpet. I also appreciate the style decisions in the standard generated plots, with a nice visually-appealing color scheme and design.

      On the use of the software, I have focused my testing on the R package, which I was not able to install. The shinyapp is very useful for visualising the existing, pre-computed trumpet plots, but I do not find it very useful for generating user-uploaded summary statistics for the reasons I mentioned above. Another comment on the ShinyApp is that I appreciate the possibility to download the plots but it would be very useful to include the name of the visualized phenotype as the plot title, for example, to avoid confusion when downloading multiple plots.

      I also found an incorrect sentence in the abstract, which is think should be reversed: " The proposed plots have a distinctive trumpet shape, with the majority of variants having low frequency and small effects, while a small number of variants have higher frequency and larger effects".

      **Reviewer 2. Wentian Li **

      Is the documentation provided clear and user friendly?

      No. Many aspects of Fig.1 are not explained.

      Overall Comments: Plots with allele frequency as x axis and effect size (e.g. odds ratio) as y axis is a very common display of the contribution from both common and rare alleles to genetic association. A schematic form of this plot is practically on almost everybody's presentation slides when introducing this topic (to see an example, see, e.g. Science (23 Nov 2012), vol 338(6110), pp.1016-1017 ). Considering how many people have already been familiar with this type of plot, I feel that very little new is added in this paper: maybe only a new name ("trumpet"), and/or the power lines. The other methods contributions (log-x, one variant per LD, avoiding gene-level statistics) are rather straightforward. People without experience with "shiny" (R package) can still use ggplot2 or plot in R to get the same result. Generally speaking, I think the paper is weak, though OK as a program/package announcement.

      Major comments: * I think the trumpet shape (increase of "effect size" for rare variant) is probably a direct consequence of using odds-ratio as a measure of effect size. If the allele frequency in normal population is p0, that in disease population is p1, [p1/(1-p1)]/[p0/(1-p0)] ~ p1/p0 tends to be large for small p0's, simply because the denominator is small. On the other hand, if population attributable risk (p0(RR-1)/(1+p0(RR-1))) is used as the y-axis, I am uncertain what the shape of the plot would be.

      • A risk allele has these pieces of information:
      • allele frequency,
      • effect size (e.g. odds ratio),
      • type-I error/p-value,
      • type-II error/power. The plot in this paper show #1 vs #2 and #4 being added as extra. In another publication with a proposal to plot genetic association results (Comp Biol. and Chem. (2014), 48:77-83 doi: 10.1016/j.compbiolchem.2013.02.003), #2 is against #3 with #1 being an added extra. I'm sure using other combinations could lead to other types of plots. The authors should discussion/compare these possibilities.

      Minor comments: In Fig.1, the size of the dots, the brown vs cyan color, the discontinuity of scatter dots around 0.01, are not explained.

      Re-review:

      I have read authors' response and I'm mostly satisfied. Only two minor comments: * Witte 2014 Nature Rev. Genet. article summarizes the point I tried to make well. I understand that rare variants should have a relatively higher effect from an evolutionary perspective, but since these are rare, their individual or even collective contribution to a disease in the population is still small. A casual reader may not realize this point and I think it would be helpful to cite Witte's article. * My minor comment on Fig.1 is still not addressed: there seem to be more points on the right side of p=0.01 line than the left side. Why this discontinuity? (the added text in Revision is about the color and size of the dots, not about this discontinuity)

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study presents a useful inventory of the joint effects of genetic and environmental factors on psychotic-like experiences, and identifies cognitive ability as a potential underlying mediating pathway. The data were analyzed using solid and validated methodology based on a large, multi-center dataset. However, the claim that these findings are of relevance to psychosis risk and have implications for policy changes are only partially supported by the results.

      We appreciate the feedback and insightful suggestions from the editor and reviewers, which aided us to improve the manuscript. We believe the concerns initially raised were mostly due to areas that needed further clarification, which we have now clarified in this revised version. Our primary contribution lies in our meticulous analytical approach aimed at minimizing confounding effects and providing more precise estimates of the genetic and environmental impact on children's cognition and psychology. This method differs from the widely used general linear modeling in the field, which, in our opinion, may not be the optimal strategy for large-scale data analysis. Our comprehensive, tutorial-style description of the methods might serve as a valuable resource for the community.

      Regarding the critique that our findings 'partially support the relevance to psychosis risk,' we have updated our manuscript to more accurately reflect this feedback. We have altered the narrative to indicate that psychotic-like experiences (PLE) are associated with the risk for psychosis, a connection substantiated by prior studies cited in our manuscript.

      Similarly, in response to the comment that our findings 'partially support implications for policy changes,' we have nuanced our conclusion. However, we would like to emphasize our discovery that a negative genetic predisposition impacting cognitive development (i.e., low polygenic scores for cognitive phenotypes) can be counteracted by a positive school and familial environment. We believe that this finding could have meaningful implication for policy making and is robustly supported by our analyses.

      We hope this revised manuscript more accurately reflects our research findings and its significances. Lastly, we would like to express our gratitude for your fair and detailed review process. Our experience working with eLife has been incredibly rewarding, and we commend your dedication to an encouraging and progressive publishing culture.  

      Public Reviews:

      Reviewer #1

      This study by Park et al. describes an interesting approach to disentangle gene-environment pathways to cognitive development and psychotic-like experiences in children. They have used data from the ABCD study and have included PGS of EA and cognition, environmental exposure data, cognitive performance data and self-reported PLEs. Although the study has several strengths, including its large sample size, interesting approach and comprehensive statistical model, I have several concerns:

      • The authors have included follow-up data from the ABCD Study. However, it is not very clear from the beginning that longitudinal paths are being explored. It would be very helpful if the authors would make their (analysis) approach clearer from the introduction. Now, they describe many different things, which makes the paper more difficult to read. It would be of great help to see the proposed path model in a Figure and refer to that in the Method.

      We clarified the longitudinal paths tested in this study in Intro [line 149~159]. We also added a figure of the proposed path model (Figure 1) [Methods: line 231~238].

      • There is quite a lot of causal language in the paper, particularly in the Discussion. My advice would be to tone this down.

      We adjusted and moderated the use of causal languages throughout the manuscript.

      • I feel that the limitation section is a bit brief, and can be developed further.

      We clearly specified the limitations of our study. These included concerns about the representativeness of the ABCD samples, of the limited scope of longitudinal data, and the use of non-randomized, observational data [line 524~544].

      • I like that the assessment of CP and self-reports PEs is of good quality. However, I was wondering which 4 items from the parent-reported CBCL were used and how did they correlate with the child-reported PEs? And how was distress taken into account in the child self-reported PEs measurement? Which PEs measures were used?

      Thanks for the clarification question. We report the Pearson’s correlation coefficients between the PLEs [line 198~200]. (The Reviewer #1 may have referred to the prior version of our manuscript submitted elsewhere, for this point has been already addressed in our initial submission to eLife).

      • What was the correlation between CP and EA PGSs?

      The Pearson’s correlation between CP and EA PGS was 0.4331 (p<0.0001). We added the statistics to the manuscript. [line 214]

      • Regarding the PGS: why focus on cognitive performance and EA? It should be made clearer from the introduction that EA is not only measuring cognitive ability, but is also a (genetic) marker of social factors/inequalities. I'm guessing this is one of the reasons why the EA PGS was so much more strongly correlated with PEs than the CP PGS. See the work bij Abdellaoui and the work by Nivard.

      We appreciate the reviewer’s insightful feedback. Acknowledging the role of both CP and EA PGSs in our study, we agree with the observation that EA PGS goes beyond gauging cognitive aptitude—it also serves as an indicator of societal influences and inequalities. The multifaceted nature of EA PGS could be the reason underlying the stronger correlation with PLEs compared to CP PGS. In response to this feedback, we revised our introduction to articulate the multifaceted role of EA PGS in more precise terms. For supporting our assertions, we have included references to prior studies (Abdellaoui et al., 2022) [line 131~142].

      Abdellaoui, A., Dolan, C. V., Verweij, K. J. H., & Nivard, M. G. (2022). Gene–environment correlations across geographic regions affect genome-wide association studies. Nature Genetics. doi:10.1038/s41588-022-01158-0

      • Considering previous work on this topic, including analyses in the ABCD Study, I'm not surprised that the correlation was not very high. Therefore, I don't think it makes a whole of sense to adjust for the schizophrenia PGS in the sensitivity analyses, in other words, it's not really 'a more direct genetic predictor of PLEs'.

      We thank the reviewer for the thoughtful comments. We acknowledge that the correlation between schizophrenia PGS and PLE may not be exceedingly high, as evidenced by previous work, including analyses from the ABCD study. However, we would like to emphasize our rationale for adjusting schizophrenia PGS in the sensitivity analyses. Our study design stemmed from the established associations between PLEs and increased risk for schizophrenia. Existing studies have reported significant associations between schizophrenia PGS and cognitive deficits in both psychosis patients (Shafee et al., 2018) and people at risk for psychosis (He et al., 2021). Notable, the PGS for schizophrenia has shown significant associations with PLEs, arguably more so than PGS for PLEs itself (Karcher et al., 2018). Our updated manuscript has incorporated these references to improve clarity. [line 307~309]. By adding this layer of adjustment, we believe that our mixed linear model more precisely examines the relationship between the cognitive phenotype PGS and PLEs, in terms of both sensitivity and specificity.

      He, Q., Jantac Mam-Lam-Fook, C., Chaignaud, J., Danset-Alexandre, C., Iftimovici, A., Gradels Hauguel, J., . . . Chaumette, B. (2021). Influence of polygenic risk scores for schizophrenia and resilience on the cognition of individuals at-risk for psychosis. Translational Psychiatry, 11(1). doi:10.1038/s41398-021-01624-z

      Karcher, N. R., Paul, S. E., Johnson, E. C., Hatoum, A. S., Baranger, D. A. A., Agrawal, A., . . . Bogdan, R. (2021). Psychotic-like Experiences and Polygenic Liability in the Adolescent Brain Cognitive Development Study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. doi:https://doi.org/10.1016/j.bpsc.2021.06.012

      Shafee, R., Nanda, P., Padmanabhan, J. L., Tandon, N., Alliey-Rodriguez, N., Kalapurakkel, S., . . . Robinson, E. B. (2018). Polygenic risk for schizophrenia and measured domains of cognition in individuals with psychosis and controls. Translational Psychiatry, 8(1). doi:10.1038/s41398-018-0124-8

      • How did the FDR correction for multiple testing affect the results?

      Please note that we have clarified our FDR correction in the methods

      As detailed in the method section [line 254~255], we applied False Discovery Rate (FDR) correction for multiple testing across nine key variables in the study: PGS (CP or EA), family income, parental education, family’s financial adversity, Area Deprivation Index, years of residence, proportion of population below -125% of the poverty line, positive parenting behavior, and positive school environment. An exception was made in our additional sensitivity analysis, where we included schizophrenia PGS in the linear mixed model for adjustment, thus the FDR correction was applied across ten key variables instead. Overall, the application of FDR correction had minimal impact on our findings. Most associations between the key variables and the outcomes that were originally marked as highly significant sustained their significance after the FDR correction.

      Overall, I feel that this paper has the potential to present some very interesting findings. However, at the moment the paper misses direction and a clear focus. It would be a great improvement if the readers would be guided through the steps and approach, as I think the authors have undertaken important work and conducted relevant analyses.

      We express our appreciation to the reviewer for the positive feedback and constructive suggestions, which only serve to improve and strengthen our manuscript. We have incorporated the suggested corrections and clarifications in response to the reviewer's suggestions. We believe that these changes will not only enhance the overall readability but also more effectively emphasize the significance and implication of our work.

      Reviewer #2 (Public Review):

      This paper tried to assess the link between genetic and environmental factors on psychotic-like experiences, and the potential mediation through cognitive ability. This study was based on data from the ABCD cohort, including 6,602 children aged 9-10y. The authors report a mediating effect, suggesting that cognitive ability is a key mediating pathway in the link between several genetic and environmental (risk and protective) factors on psychotic-like experiences.

      While these findings could be potentially significant, a range of methodological unclarities and ambiguities make it difficult to assess the strength of evidence provided.

      Strengths of the methods:

      The authors use a wide range of validated (genetic, self- and parent-reported, as well as cognitive) measures in a large dataset with a 2-year follow-up period. The statistical methods have the potential to address key limitations of previous research.

      Weaknesses of the methods:

      The rationale for the study is not completely clear. Cognitive ability is probably a more likely mediator of traits related to negative symptoms in schizophrenia, rather than positive symptoms (e.g., psychosis, psychotic-like symptom). The suggestion that cognitive ability might lead to psychotic-like symptoms in the general population needs further justification.

      We appreciate the reviewer’s concern regarding the role of cognitive ability in relation to schizophrenia symptoms. We are aware that cognitive ability often serves as a mediator of psychotic-like experiences. However, to our best knowledge, a growing body of research has proposed that cognitive ability can mediate positive symptoms in schizophrenia including psychotic-like experiences. The studies by Howes & Murray (2014) and Garety et al. (2001) suggested that deficits in cognitive ability can potentially contribute to the manifestation of positive symptoms such as psychotic-like experiences. We have elaborated on this aspect in the Introduction section [line 104-115].

      Howes, O. D., & Murray, R. M. (2014). Schizophrenia: an integrated sociodevelopmental-cognitive model. The Lancet, 383(9929), 1677-1687. doi:https://doi.org/10.1016/S0140-6736(13)62036-X

      Garety, P. A., Kuipers, E., Fowler, D., Freeman, D., & Bebbington, P. E. (2001). A cognitive model of the positive symptoms of psychosis. Psychological Medicine, 31(2), 189-195. doi:10.1017/S0033291701003312

      Terms are used inconsistently throughout (e.g., cognitive development, cognitive capacity, cognitive intelligence, intelligence, educational attainment...). It is overall not clear what construct exactly the authors investigated.

      We thank the reviewer’s feedback regarding the consistency of terminology in our manuscript. Per the suggestion, we standardized the use of ‘cognitive capacity’ and now consistently refer to it as ‘cognitive phenotypes’ throughout our manuscript. Furthermore, we explicitly stated in the Introduction section that our two PGSs of focus will be termed ‘cognitive phenotypes PGSs’, aligning with terminology used in prior studies (Joo et al., 2022; Okbay et al., 2022; Selzam et al., 2019) [line 140~142].

      Joo, Y. Y., Cha, J., Freese, J., & Hayes, M. G. (2022). Cognitive Capacity Genome-Wide Polygenic Scores Identify Individuals with Slower Cognitive Decline in Aging. Genes, 13(8), 1320. doi:10.3390/genes13081320

      Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., . . . Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54(4), 437-449. doi:10.1038/s41588-022-01016-z

      Selzam, S., Ritchie, S. J., Pingault, J.-B., Reynolds, C. A., O’Reilly, P. F., & Plomin, R. (2019). Comparing Within- and Between-Family Polygenic Score Prediction. The American Journal of Human Genetics, 105(2), 351-363. doi:https://doi.org/10.1016/j.ajhg.2019.06.006

      Not the largest or most recent GWASes were used to generate PGSes.

      We appreciate the reviewer’s observation. Indeed, we were unable to utilize the most recent or the largest GWAS for cognitive performance, educational attainment, and schizophrenia due to the timeline of our study. Regrettably, the commencement of our study preceded the publication of the ‘currently’ the largest or most recent GWAS studies by Okbay et al. (2022) and Trubetskoy et al. (2022). Our research was conducted with the best available data at that time, which was the GWAS of European-descent individuals for educational attainment and cognitive performance (Lee et al, 2018). To eliminate any potential confusion, we adjusted the text to specify that our study used 'a GWAS of European-descent individuals for educational attainment and cognitive performance' rather than the largest GWAS [line 206~208].

      It is not fully clear how neighbourhood SES was coded (higher or lower values = risk?). The rationale, strengths, and assumptions of the applied methods are not fully clear. It is also not clear how/if variables were combined into latent factors or summed (weighted by what). It is not always clear when genetic and when self-reported ethnicity was used. Some statements might be overly optimistic (e.g., providing unbiased estimates, free even of unmeasured confounding; use of representative data).

      Thank you for pointing this out. Consistent with the illustration of neighborhood SES in the Methods, higher values of neighborhood SES indicate risk [line 217~228]. In the original Figure 2, higher value of neighborhood SES links to lower intelligence (direct effects: β=-0.1121) and higher PLEs (indirect effects: β=-0.0126~ -0.0162). We think such confusion might have been caused by the difference between family SES (higher values = lower risk) neighborhood SES (higher values = higher risk). Thus, we changed the terms to ‘High Family SES’ and ‘Low Neighborhood SES’ in the corrected figure (Figure 3) for clarification.

      Considering that shorter duration of residence may be associated with instability of residency, it may indicate neighborhood adversity (i.e., higher risk). This definition of the ‘years of residence’ variable is in line with the previous study by Karcher et al. (2021).

      During estimation, the IGSCA determines weights of each observed variable in such a way as to maximize the variances of all endogenous indicators and components. We added this explanation in the description about the IGSCA method [line 266~268].

      We deleted overly optimistic statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead, throughout our manuscript.

      Karcher, N. R., Schiffman, J., & Barch, D. M. (2021). Environmental Risk Factors and Psychotic-like Experiences in Children Aged 9–10. Journal of the American Academy of Child & Adolescent Psychiatry, 60(4), 490-500. doi:10.1016/j.jaac.2020.07.003

      It appears that citations and references are not always used correctly.

      We thoroughly checked all citations and specified the references for each statement: We deleted Plomin & von Stumm (2018) and Harden & Koellinger (2020) and cited relevant primary studies (e.g., Lee et al., 2018; Okbay et al., 2022; Abdellaoui et al., 2022) instead. We also specified the references supporting the statement that educational attainment PGS links to brain morphometry (Judd et al., 2020; Karcher et al., 2021). As Okbay et al. (2022) use PGS of cognitive intelligence (which mentions the analyses results in their supplementary materials) as well as educational attainment, we decided to continue citing this reference [line 131~141].

      Strengths of the results:

      The authors included a comprehensive array of analyses.

      We thank the reviewer for the positive comment.

      Weaknesses of the results:

      Many results, which are presented in the supplemental materials, are not referenced in the main text and are so comprehensive that it can be difficult to match tables to results. Some of the methodological questions make it challenging to assess the strength of the evidence provided in the results.

      As you rightly identified, we inadvertently failed to reference Table S2 in the main text. We have since corrected this omission in the Results section for the IGSCA (SEM) analysis [line 376]. The remainder of the supplementary tables (Table S1, S3~S7) have been appropriately cited in the main manuscript. We recognize that the quantity of tables provided in the supplementary materials is substantial. However, given the comprehensiveness and complexity of our analyses, which encompass a wide array of study variables, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too significant to exclude from the supplementary materials. That said, we are open to, and would greatly welcome, any further suggestions on how to present our supplementary results in a more clear and digestible format. Your guidance in this matter is highly valued.

      Appraisal:

      The authors suggest that their findings provide evidence for policy reforms (e.g., targeting residential environment, family SES, parenting, and schooling). While this is probably correct, a range of methodological unclarities and ambiguities make it difficult to assess whether the current study provides evidence for that claim.

      We believe that with the improvement we made in this revised manuscript, this concern may have been successfully mitigated.

      Impact:

      The immediate impact is limited given the short follow-up period (2y), possibly concerns for selection bias and attrition in the data, and some methodological concerns.

      We appreciate the feedback provided in the reviewer's impact statement. We added as study limitations [line 524~544] that the impact of our findings may be limited due to the relatively short follow-up period, the possibility of sample selection bias, and the problems of interpreting results from an observational study as causality (despite the novel causal inference methods, designed for non-randomized, observational data, that we used).

      As responded above (and also in more detail in the Reviewer #2’s Recommendations For The Authors section below), we made necessary corrections and clarifications for the points suggested by the reviewer. As we are willing to make additional revisions, please feel free to give comments if you feel that our corrections are insufficient or inappropriate.

      Nevertheless, we would like to discuss some points. We sincerely hope this following response does not come across as argumentative to the reviewer and the editor. We fully understand the reviewer's perspective on this matter, and we agree that the issues raised about the ABCD study are absolutely valid. However, when evaluating the overall impact of a study, other factors, such as how the field has been assessing the impact of similar studies, should also be considered.

      Firstly, the potential selection bias and attrition in the ABCD data may not necessarily limit the conclusions of this study. While recognizing the potential issues with the ABCD data is important, we feel that judging the impact of our findings as "limited" based on these issues may not be entirely fair. This is because no study, particularly those of a nationwide scale such as the UK Biobank, IMAGEN, HEAL, HBCD, etc., is completely free of limitations. Typically, the potential limitations of the data don't undermine the impact of individual studies' findings. Numerous studies using ABCD data have been published in top-tier journals—despite the limitations of the ABCD study—underscoring the scientific merit of the findings. For example, the study by Tomasi, D., & Volkow, N. D. (2021), entitled "Associations of family income with cognition and brain structure in USA children: prevention implications," published in Molecular Psychiatry, might be highly relevant to the limitations of the ABCD study raised by the reviewer. The scientific community, including editors, reviewers, and readers, may have appreciated the impact of this study despite the acknowledged limitations of the ABCD data.

      Secondly, the two-year time window of our longitudinal analysis might not impact the aim of this study—an iterative assessment of the associations between genetic and environmental variables with cognitive intelligence and mental health, with a focus on PLE, in preadolescents. Had we aimed to test the developmental trajectory from childhood to adolescence, perhaps a longer timeframe would have made more sense. So, we do not agree with the reviewer’s assessment that the short time window limits the impact of our study.

      Suggested revisions based on the combined reviewer feedback:

      1) The terminology used should be carefully reviewed and revised

      • Please use the correct terminology for the key concepts assessed in this study. For example, authors sometimes conflate PLEs and psychosis, two related but separate constructs. Furthermore, the terms 'good parenting' and 'good schooling' are vague and subjective.

      • The authors use multiple terms to refer to cognitive ability (cognitive capacity, intelligence, cognitive intelligence, etc). The term 'cognitive development' in the title and manuscript does not seem to be justified given the focus on different measures of cognitive ability at a single time point (i.e. baseline).

      • Please avoid causal language and using statements that cannot be entirely substantiated (e.g. unbiased estimates, free from unmeasured confounding)

      Thank you for suggesting this point. We revised all key terminologies used throughout our manuscript.

      Per your suggestion, we specified that PLEs indicate the risk of psychosis and often precede schizophrenia. We checked all misused cases of the term ‘psychosis’ and corrected them as ‘PLEs’. We also changed the terms 'good parenting' and 'good schooling' to ‘positive parenting behavior’ and ‘positive school environment’.

      We changed the term ‘cognitive development’ to ‘cognitive ability’ throughout our manuscript. We also changed the title to ‘Gene-Environment Pathways to Cognitive Intelligence and Psychotic-Like Experiences in Children’ because we used ‘cognitive intelligence’ for NIH toolbox variable in the text.

      We corrected and tone-downed all causal languages used in our manuscript. As mentioned by the reviewers, we deleted statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead.

      2) A stronger rationale for the focus on PLEs, and the potential mediating role of cognitive ability in genetic and environmental effects on PLES, should be provided

      We appreciate the raised concerns that cognitive ability may serve as a mediator of psychotic-like experiences. To our best knowledge, it has been proposed that cognitive ability can be a mediator of positive symptoms in schizophrenia (including psychotic-like experiences), as well as negative symptoms. This mediating role of cognitive ability was proposed in several prior studies on cognitive model of schizophrenia/psychosis. Per your suggestion, we included an additional justification in Intro [line 104~115] where we highlighted that cognitive ability has been proposed as a potential mediator of genetic and environmental influence on positive symptoms of schizophrenia such as psychotic-like experiences. We refer to studies conducted by Howes & Murray (2014) and Garety et al. (2001).

      Howes, O. D., & Murray, R. M. (2014). Schizophrenia: an integrated sociodevelopmental-cognitive model. The Lancet, 383(9929), 1677-1687. doi:https://doi.org/10.1016/S0140-6736(13)62036-X

      Garety, P. A., Kuipers, E., Fowler, D., Freeman, D., & Bebbington, P. E. (2001). A cognitive model of the positive symptoms of psychosis. Psychological Medicine, 31(2), 189-195. doi:10.1017/S0033291701003312

      3) As described in more detail by the reviewers, more information should be provided about the measures used in the study and how they relate to one another (e.g. correlations between PQ-BC and CBCL; PGS-CA and PGS-EA).

      Thank you for your suggestion. Although this information was already provided in our initial submission, it appears that the Reviewer #1’s might have referred to the prior version of our manuscript submitted elsewhere before eLife.

      To clarify, our findings reveal significant Pearson’s correlation coefficients between PLEs across all time-points (baseline year: r=0.095~0.0989, p<0.0001; 1-year follow-up: r=0.1322~0.1327, p<0.0001; 2-year follow-up: r= 0.1569~0.1632, p<0.0001) and we added this information in the Method section [line 198~200]. We also added the Pearson’s correlation between the two PGSs (r=0.4331, p<0.0001) in the Methods for PGS [line 214].

      4) More details are needed regarding the analytical strategies used (e.g. how imputation was performed, why PGS were not based on the largest and most recent GWASes, whether latent or observed variables were examined, what exactly the supplementary materials show and how they relate to information provided in the main text).

      We appreciate your feedback. We acknowledge the concerns about the GWAS sources utilized for the study. Unfortunately, our study commenced prior to the publication of the ‘currently’ most recent or largest GWAS by Okbay et al. (2022) and Trubetskoy et al. (2022). Our research was conducted with the best available data at that time, which was the largest GWAS of European-descent individuals for educational attainment and cognitive performance (Lee et al, 2018). We have now clarified this point in the manuscript. [line 206~208]

      Also, we specified the use of composite indicators for the PGS, family SES, neighborhood SES, positive family and school environment, and PLEs, while latent factors were used for cognitive intelligence [line 269~285].

      We highly appreciate the reviewer’s comments regarding the supplementary materials. We regret overlooking the citation of Table S2 in the main manuscript, and this has now been rectified in the Results section for the IGSCA (SEM) analysis [line 376]. The remaining supplementary tables (Table S1, S3~S7) have been correctly referenced within the manuscript. We acknowledge that the supplementary materials are extensive due to the comprehensive array of study variables and intricate results from each analysis. However, given that our analyses encompass a wide array of study variables, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too crucial to exclude from the supplementary materials. That said, we are open to any further suggestions to make our supplementary results more accessible and digestible. In order to improve the accessibility and clarity of our presentation, we are fully committed to making any necessary changes and look forward to any further recommendations.

      5) The limitation section should be expanded and statements regarding the implications of the study findings should be qualified accordingly (e.g. short follow-up period, potential for attrition and selection bias, reverse causation, etc)

      We specified additional potential constraints of our study, including limited representativeness, limited periods of follow-up data (baseline year, 1-year, and 2-year follow-up), possible sample selection bias, and the use of non-randomized, observational data [line 524~544].

      6) Please ensure that the references provided support the statements in the text to which they are linked to.

      Thank you for pointing this out. We thoroughly went over all citations and corrected the inaccurately or vaguely cited references for each statement.

      Reviewer #2 (Recommendations For The Authors):

      1) Please use terms consistently and correctly. E.g., 'cognitive capacity' is not the same as 'educational attainment'.

      We thank the reviewer’s feedback regarding the consistency of terminology in our manuscript. Per the suggestion, we standardized the use of ‘cognitive capacity’ and now consistently refer to it as ‘cognitive phenotypes’ throughout our manuscript. Furthermore, we explicitly stated in the Introduction section that our two PGSs of focus will be termed ‘cognitive phenotypes PGSs’, aligning with terminology used in prior studies (Joo et al., 2022; Okbay et al., 2022; Selzam et al., 2019) [line 140~142].

      Joo, Y. Y., Cha, J., Freese, J., & Hayes, M. G. (2022). Cognitive Capacity Genome-Wide Polygenic Scores Identify Individuals with Slower Cognitive Decline in Aging. Genes, 13(8), 1320. doi:10.3390/genes13081320

      Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., . . . Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54(4), 437-449. doi:10.1038/s41588-022-01016-z

      Selzam, S., Ritchie, S. J., Pingault, J.-B., Reynolds, C. A., O’Reilly, P. F., & Plomin, R. (2019). Comparing Within- and Between-Family Polygenic Score Prediction. The American Journal of Human Genetics, 105(2), 351-363. doi:https://doi.org/10.1016/j.ajhg.2019.06.006

      2) The authors study 'cognitive performance using seven instruments', but it is not clear how fluid and crystalline intelligence was defined/operationalized.

      Thank you for pointing this out. We specified the NIH Toolbox tests used for composite scores of fluid and crystallized intelligence, respectively. “We utilized baseline observations of uncorrected composite scores of fluid intelligence (Dimensional Change Card Sort Task, Flanker Test, Picture Sequence Memory Test, List Sorting Working Memory Test), crystallized intelligence (Picture Vocabulary Task and Oral Reading Recognition Test), and total intelligence (all seven instruments) provided in the ABCD Study dataset” [line 180~187].

      3) I don't think Lee 2018 is the largest GWAS for educational attainment. That would be Okbay 2022. It needs to be described how cognitive performance was defined in Lee 2018. Why did the authors not use the Trubetskoy 2022 schizophrenia GWAS?

      Thank you for mentioning this point. The reason why we were not able to use the largest GWAS for CP, EA and schizophrenia is because (unfortunately) our study started earlier than the point when the GWAS studies by Okbay et al. (2022) and Trubetskoy et al. (2022) were published. We corrected that our study used ‘a GWAS of European-descent individuals for educational attainment and cognitive performance’ instead of the largest GWAS [line 206~208].

      4) It is unclear how neighbourhood SES was coded. The authors seem to suggest that higher values indicate risk, but Figure 2 suggests that higher values links to higher intelligence and lower PLE.

      Thank you very much for pointing this out. Consistent with the illustration of neighborhood SES in the Methods section, higher values of neighborhood SES indicate risk. In the original Figure 2, higher values of neighborhood SES links to lower intelligence (direct effects: β=-0.1121) and higher PLEs (indirect effects: β=-0.0126~-0.0162). We think such confusion might have been caused by the difference between family SES (higher values = lower risk) neighborhood SES (higher values = higher risk). Thus, we changed the terms to ‘High Family SES’ and ‘Low Neighborhood SES’ in the corrected figure (Figure 3) for clarification.

      5) Also, the 'year of residence' variable is unclearly defined. Does this mean that a shorter duration of residency (even in a good neighbourhood) indicate risk?

      Thank you for mentioning this point. Considering that shorter duration of residence may be associated with instability of residency, it may indicate neighborhood adversity (i.e., higher risk). This definition of the ‘years of residence’ variable is in line with the previous study by Karcher et al. (2021).

      Karcher, N. R., Schiffman, J., & Barch, D. M. (2021). Environmental Risk Factors and Psychotic-like Experiences in Children Aged 9–10. Journal of the American Academy of Child & Adolescent Psychiatry, 60(4), 490-500. doi:10.1016/j.jaac.2020.07.003

      6) Please provide information on how correlated the two PGSes were.

      Thank you for your suggestion. We added the Pearson’s correlation between the two PGSs (r=0.4331, p<0.0001) in the Methods section for PGS [line 214].

      7) Information on the outcome variable in the 'linear mixed models' section is missing. I assumed it was PLE.

      Thank you for notifying us of this point. We added the information on the outcome variables in the section for linear mixed models [line 242~244].

      8) In the 'Path Modeling' section, please explain what 'factors and components' concretely refer to. How is this different from a standard SEM with latent factors?

      Thank you for your comment on the need to elaborate the IGSCA method. We added that different from standard SEM methods which only uses latent factors, the IGSCA method can use components as well as latent factors as constructs in model estimation. This allows the IGSCA method to control bias more effectively in estimation compared to the standard SEM [line 261~268].

      9) The sentence starting line 229 is unclear. Does this mean variables were not used to generate latent factors. And if not, what weights were used to create a 'weighted sum'?

      Thank you for mentioning this point. The sentence means that we treated PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs as composite indicators (derived from a weighted sum of relevant observed variables), while general intelligence was represented as a latent factor.

      It has been suggested from prior studies that these variables (PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs) are less likely to share a common factor and were assessed as a composite index during analyses. For instance, Judd et al. (2020) and Martin et al. (2015) analyze genetic influence of educational attainment and ADHD as composite indicators. Also, as mentioned in Judd et al. (2020), socioenvironmental influences are often analyzed as composite indicators. Studies on psychosis continuum (e.g., van Os et al., 2009) suggest that psychotic disorders are likely to have multiple background factors instead of having a common factor, and notes that numerous prior research uses composite indices to measure psychotic symptoms. Based on this literature, we used components for these variables.

      The IGSCA determines weights of each observed variable to maximize the variances of the endogenous indicators and components [added in line 265~268].

      On the other hand, we treated general intelligence as a latent factor/variable underlying fluid and crystallized intelligence. This is based on the extensive literature of classical g theory of intelligence [added in line 269~284].

      Judd, N., Sauce, B., Wiedenhoeft, J., Tromp, J., Chaarani, B., Schliep, A., ... & Klingberg, T. (2020). Cognitive and brain development is independently influenced by socioeconomic status and polygenic scores for educational attainment. Proceedings of the National Academy of Sciences, 117(22), 12411-12418.

      Martin, J., Hamshere, M. L., Stergiakouli, E., O'Donovan, M. C., & Thapar, A. (2015). Neurocognitive abilities in the general population and composite genetic risk scores for attention‐deficit hyperactivity disorder. Journal of Child Psychology and Psychiatry, 56(6), 648-656.

      van Os, J., Linscott, R., Myin-Germeys, I., Delespaul, P., & Krabbendam, L. (2009). A systematic review and meta-analysis of the psychosis continuum: Evidence for a psychosis proneness–persistence–impairment model of psychotic disorder. Psychological Medicine, 39(2), 179-195. doi:10.1017/S0033291708003814

      10) It is overall not clear when genetically and when self-reported information of ethnicity was used. This needs to be clearer throughout.

      Thank you for mentioning this point. We only used genetically defined ethnicity, and we have not mentioned that we used self-reported ethnicity. Per your suggestion, we clarified that we used ‘genetic ethnicity’ throughout the paper.

      11) The sentence starting line 253 is also unclear. How is schizophrenia PGS a 'more direct genetic predictor of PLE' and compared to what other measure?

      Thank you for pointing this out. Please note that our adjustment (or sensitivity analyses) was based on the reported associations between PLEs and the risk for schizophrenia: schizophrenia PGS is associated with a cognitive deficit in psychosis patients (Shafee et al., 2018) and individuals at-risk of psychosis (He et al., 2021), and psychotic-like experiences (more so than PGS for psychotic-like experiences) (Karcher et al., 2018). We added these references for clarification [line 307~309]. We believe that because of the adjustment our results from the mixed linear model show the sensitivity and specificity of the association between cognitive phenotype PGS and PLEs.

      He, Q., Jantac Mam-Lam-Fook, C., Chaignaud, J., Danset-Alexandre, C., Iftimovici, A., Gradels Hauguel, J., . . . Chaumette, B. (2021). Influence of polygenic risk scores for schizophrenia and resilience on the cognition of individuals at-risk for psychosis. Translational Psychiatry, 11(1). doi:10.1038/s41398-021-01624-z

      Karcher, N. R., Paul, S. E., Johnson, E. C., Hatoum, A. S., Baranger, D. A. A., Agrawal, A., . . . Bogdan, R. (2021). Psychotic-like Experiences and Polygenic Liability in the Adolescent Brain Cognitive Development Study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. doi:https://doi.org/10.1016/j.bpsc.2021.06.012

      Shafee, R., Nanda, P., Padmanabhan, J. L., Tandon, N., Alliey-Rodriguez, N., Kalapurakkel, S., . . . Robinson, E. B. (2018). Polygenic risk for schizophrenia and measured domains of cognition in individuals with psychosis and controls. Translational Psychiatry, 8(1). doi:10.1038/s41398-018-0124-8

      12) Please include a statement on the assumptions made when using the method used in this study and developed by Miao 2022, explain what evidence you have to support these assumptions and how this method, which I believe was developed for RCTs, can be applied to observational data.

      We specified the assumptions for the causal inference method proposed by Miao et al. (2022) and why it is applicable to our study. Also, we noted that this novel method was developed to identify the causal effects of multiple treatment variables within non-randomized, observational data [line 309~319].

      13) Some of the statements are potentially misleading. E.g., I would be very cautious to claim that the methods applied allowed the authors to estimate 'unbiased associations again potential (even unobserved) confounding variables'. There are many concerns such as selection bias, attrition, reverse causation, genetic confounding, etc that cannot be addressed satisfactorily using these data and methods.

      Thank you for pointing this out. We deleted statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead.

      Nevertheless, please note that due to some limitations in the data (e.g., confounders), an analytic approach should be robust enough to handle potential violations of assumptions. This was the point we wanted to emphasize--In contrast to the majority of studies using the ABCD study, which employ simplistic GLM or conventional SEM with only latent variable modeling, our study provides less biased, thus more accurate, estimates through the use of sophisticated modeling for confounding effects (instead of simplistic GLM) and IGSCA (instead of conventional simplistic SEM). We hope our study may help improve our analytical approach in this field.

      14) I would be equally cautious to claim that the ABCD study is representative. Please add information on the whole ABCD cohort to Table 1 and describe any relevance with respect to attrition effects or representativeness.

      Thank you for highlighting this issue. We previously characterized the ABCD Study as representative of the US population, given its aim to ensure representativeness by recruiting from a broad range of school systems located near each of its 21 research sites, chosen for their geographic, demographic, and socioeconomic diversity. Using epidemiological strategies, a stratified probability sample of schools was selected for each site. This procedure took into account sex, race/ethnicity, socioeconomic status, and urbanicity to reduce potential sampling biases at the school level. Based on these strategies, previous research (e.g., Thompson et al., 2019; Zucker et al., 2018) has referred to the ABCD Study as ‘representative.’ However, we overlooked the fact that “not all 9-year-old and 10-year-old children in the United States had an equal chance of being invited to participate in the study,” and therefore, it should not be deemed fully representative of the US population (Compton et al., 2019). Heeding your suggestion, we have removed all descriptions of the ABCD Study being representative.

      Compton, W. M., Dowling, G. J., & Garavan, H. (2019). Ensuring the Best Use of Data: The Adolescent Brain Cognitive Development Study. JAMA Pediatrics, 173(9), 809-810. doi:10.1001/jamapediatrics.2019.2081

      Thompson, W. K., Barch, D. M., Bjork, J. M., Gonzalez, R., Nagel, B. J., Nixon, S. J., & Luciana, M. (2019). The structure of cognition in 9 and 10 year-old children and associations with problem behaviors: Findings from the ABCD study’s baseline neurocognitive battery. Developmental Cognitive Neuroscience, 36, 100606. doi:10.1016/j.dcn.2018.12.004

      Zucker, R. A., Gonzalez, R., Feldstein Ewing, S. W., Paulus, M. P., Arroyo, J., Fuligni, A., . . . Wills, T. (2018). Assessment of culture and environment in the Adolescent Brain and Cognitive Development Study: Rationale, description of measures, and early data. Developmental Cognitive Neuroscience, 32, 107-120. doi:https://doi.org/10.1016/j.dcn.2018.03.004

      15) The imputation methods need to be explained in more detail / more clearly. What concrete variables were included? Why was 50% of the sample excluded despite imputation? How similar is the study sample to the overall ABCD cohort - and to the US population in general (i.e., is this a representative dataset)?

      Thank you for mentioning this point. We clarified the method and detailed processes of the imputation (e.g., R package VIM, number of missing observations for each study variables such as genotypes, follow-up observations, and positive environment) [Methods; line 167~176].

      The final samples had significantly higher cognitive intelligence, parental education, family income, and family history of psychiatric disorders, lower Area Deprivation Index, percentage of individuals below -125% of the poverty level, and family’s financial adversity (p<0.05). As you have noted above, these results also show the limited representativeness of the data used in our study. We fully acknowledge that our study sample, as well as the overall ABCD cohort, is not representative of the US population in general.

      16) There are a range of unclear statements (e.g., 'Supportive parenting and a positive school environment had the largest total impact on PLEs than genetic or environmental factors' - isn't parenting an environmental factor?).

      Thank you for mentioning this point. We clarified seemingly vague expressions and unclear statements. We corrected the sentence you noted as ‘Supportive parenting and a positive school environment had the largest total impact on PLEs than any other genetic or environmental factors’ [line 57~58].

      17) The authors' conclusion (that these findings have policy implications for improving school and family environmental) are not fully supported by the evidence. E.g., genetic effects were equally large.

      Thank you for pointing this out. Our description should be clearer. Our models consistently show that the combined environmental effects of positive family/school environment, and family/neighborhood SES exceeds the genetic effects. We suggest that these findings may have policy implications for “improving the school and family environment and promoting local economic development” [line 62~64].

      To clarify, we newly added “Despite the undeniable genetic influence on PLEs, when we combine the total effect sizes of neighborhood and family SES, as well as positive school environment and parenting behavior (∑▒〖|β|〗=0.2718~0.3242), they considerably surpass the total effect sizes of cognitive phenotypes PGSs (|β|=0.0359~0.0502)” [line 510~513]. Based on these results, we suggest that our findings hold potential policy implications for “preventative strategies that target residential environment, family SES, parenting, and schooling—a comprehensive approach that considers the entire ecosystem of children's lives—to enhance children's cognitive ability and mental health” in the Discussion [line 507~510].

      Admittedly, our results do not directly demonstrate a causal effect wherein an intervention in the school or family environmental variables would necessarily lead to a significantly meaningful positive impact on a child's cognitive intelligence and mental health. We do not make such a claim in this paper. However, we anticipate that further integrative analyses akin to ours might help identify potential causal or prescriptive effects. We hope this perspective will be recognized as one of the contributions of our study. We leave the final decision to the discerning judgment of the editors and reviewers.

      18) Many citations do not support the statements made and are sometimes used rather vaguely. For example, I believe Judd 2020 and Okbay 2022 did not use a PGS of cognitive capacity, but of educational attainment. Plomin 2018 and Harden 2020 are reviews, but the primary studies should be cited instead. Which reference exactly is supporting the statement that cognitive capacity PGS links to brain morphometry?

      Thank you very much for your precise observations. We thoroughly checked all citations and updated the references for each statement.

      We deleted Plomin & von Stumm (2018) and Harden & Koellinger (2020) and cited relevant original research articles (e.g., Lee et al., 2018; Okbay et al., 2022; Abdellaoui et al., 2022) instead. We also specified the references supporting the statement that educational attainment PGS links to brain morphometry (Judd et al., 2020; Karcher et al., 2021). As Okbay et al. (2022) used the PGS of cognitive intelligence (which presented the analyses results in their supplementary materials) as well as educational attainment, we decided to continue citing this reference [line 131~141].

      19) Citations are formatted inconsistently.

      We apologize for the inconsistency of the citation formatting. We formatted all citations in APA 7th style, using EndNote v20. We checked that all citations maintain consistency according to the reference style.

      20) Re line 281, I believe effect sizes are 'up to twice as large', but not consistently twice as large as suggested in the text.

      Thank you for mentioning this point. We corrected the sentence as ‘The effect sizes of EA PGS on children's PLEs were larger than those of CP PGS’ [line 342~343].

      21) Please add to the results a short statement on what covariates these analyses were controlled for.

      Thank you for giving us this comment. We added that we used sex, age, marital status, BMI, family history of psychiatric disorders, and ABCD research sites as covariates in the Results section [line 329~331].

      22) Cho 2020 does not provide recommendations on FIT values (line 315). Please provide another reference and explain how these FIT values should be interpreted.

      Thank you for mentioning this point. We added the correct reference for FIT values (Hwang, Cho, & Choo, 2021). We also added that the FIT values range from 0 to 1, and a larger FIT value indicates more variance of all variables is explained by the specified model (e.g., FIT=0.50 denotes that the model explains 50% of the total variance of all variables) [line 291~293].

      23) Regarding Figure 2, please add factor loadings to this figure and explain what the difference between the hexagon and circular shapes are. Please also add the autocorrelations between the 3 PLE measures. I assume these were also modelled statistically, given the strong correlations between time points?

      Figure 2B needs reworking.

      It is unclear what the x-axis of Figure 2C represents. Proportion of R2 or effect size? SM table 2 provides key information, which should be added to Figure 2.

      Thank you for pointing this out. We added factor loadings to the corrected figure (Figure 3A and 3B). We also added that the X-axis of Figure 3C represents standardized effect sizes.

      24) I suggest adding units directly to Table 1, not in the legend. Was genetic or self-reported ethnicity used in this table? List age in years, not months?

      Thank you for your suggestion. We added the units of age and family history of psychiatric disorders directly inside Table 1. We used genetic ethnicity in Table 1, as we only used genetic ethnicity (but not self-reported ethnicity) throughout our study. This is noted on the last row of Table 1. We listed age in chronological months, which is how each child’s age at each point of data collection is coded in the ABCD Study.

      25) Please include exact p-values in Table 2.

      Thank you for your suggestion. We highly appreciate the reviewer’s comment on the importance of showing exact p-values in the analysis results. Unfortunately, we cannot estimate the standard errors based on normal-theory approximations to obtain the exact p-values of our IGSCA model results. This is described in detail in the original paper of the IGSCA method (Hwang et al., 2021): “Like GSCA and GSCAM, IGSCA is also a nonparametric or distribution-free approach in the sense that it estimates parameters without recourse to distributional assumptions such as multivariate normality of indicators. As a trade-off of no reliance on distributional assumptions, it cannot estimate the standard errors of parameter estimates based on asymptotic (normal-theory) approximations. Instead, it utilizes the bootstrap method (Efron, 1979, 1982) to obtain the standard errors or confidence intervals of parameter estimates nonparametrically.”

      Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1–26. http://dx.doi.org/10.1214/aos/1176344552

      Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Philadelphia, PA: SIAM. http://dx.doi.org/10.1137/1.9781611970319

      Hwang, H., Cho, G., Jung, K., Falk, C. F., Flake, J. K., Jin, M. J., & Lee, S. H. (2021). An approach to structural equation modeling with both factors and components: Integrated generalized structured component analysis. Psychological Methods, 26(3), 273-294. doi:10.1037/met0000336

      26) There are way too many indigestible tables presented in the supplementary materials, which are also not referenced in the main manuscript.

      We appreciate your insightful observation. As you rightly identified, we inadvertently failed to reference Table S2 in the main text. We have since corrected this omission in the Results section for the IGSCA (SEM) analysis [line 376]. The remainder of the supplementary tables (Table S1, S3~S7) have been appropriately cited in the main manuscript. We recognize that the quantity of tables provided in the supplementary materials is substantial. However, given the comprehensiveness and complexity of our analyses, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too significant to exclude from the supplementary materials. That said, we are open to, and would greatly welcome, any further suggestions to ensure clarity and ease of comprehension. Your guidance in this matter is highly valued.

      27) Figure S1 is unclear, possibly due to the journal formatting. Is this one figure presented on two pages? Clarify which PGS is listed in Figure S1 and in any case, please add both PGSs.

      Thank you for mentioning this point. Figure S1 presents two correlation matrices: the first one is the correlation matrix of component / factor variables in the IGSCA model and the second one is the that of observed variables used to construct the relevant component / factor variables in the IGSCA model. We noted each matrix as Figure S1-A and Figure S1-B. We also corrected the figure legend as “A. Correlation between all component / factor variables of the IGSCA model. B. Correlation between all observed variables used to construct the relevant component / factor variables in the IGSCA model.” Since Figure S1-A presents correlations between the components and latent factors, it lists a single PGS variable constructed from the CP PGS and EA PGS. On the other hand, Figure S1-B presents correlations between the observed variables. Thus, both CP PGS and EA PGS are listed in this correlation matrix.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study expands on current knowledge of allosteric diversity in the human kinome by C-terminal splicing variants using as a paradigm DCLK1. The authors provide solid evolutionary and some mechanistic evidence how C-terminal isoform specific variants generated by alternative splicing can regulate catalytic activity by means of coupling specific phosphorylation sites to dynamical and conformational changes controlling active site and substrate pocket occupancy, as well as protein-protein interactions. The data will be of interest to researchers in the kinase and signal transduction field.

      We thank the editor for coordinating the review of our manuscript and the reviewers for their valuable feedback. We have significantly revised the manuscript in response to the reviewer’s comments. Our point-by-point response to each comment is present below. We have uploaded both a clean draft of our revised manuscript as well as a version with the revisions highlighted in yellow. We hope the revised manuscript is now acceptable for publication in eLife. We have additionally updated the preprint on bioRxiv and have included the link: We thank the editor for coordinating the review of our manuscript and the reviewers for their valuable feedback. We have significantly revised the manuscript in response to the reviewer’s comments. Our point-by-point response to each comment is present below. We have uploaded both a clean draft of our revised manuscript as well as a version with the revisions highlighted in yellow. We hope the revised manuscript is now acceptable for publication in eLife. We have additionally updated the preprint on biorxiv and have included the link here: https://www.biorxiv.org/content/10.1101/2023.03.29.534689v2.

      Reviewer #1

      Summary

      In the study by Venkat et al. the authors expand the current knowledge of allosteric diversity in the human kinome by c-terminal splicing variants using as a paradigm DCLK1. In this work, the authors provide evolutionary and some mechanistic evidence about how c-terminal isoform specific variants generated by alternative splicing can regulate catalytic activity by means of coupling specific phosphorylation sites to dynamical and conformational changes controlling active site and substrate pocket occupancy, as well as interfering with protein-protein interacting interfaces that altogether provides evidence of c-terminal isoform specific regulation of the catalytic activity in protein kinases.

      The paper is overall well written, the rationale and the fundamental questions are clear and well explained, the evolutionary and MD analyses are very detailed and well explained. The methodology applied in terms of the biochemical and biophysical tools falls a bit short in some places and some comments and suggestions are given in this respect. If the authors could monitor somehow protein auto-phosphorylation as a functional readout would be very useful by means of using phospho-specific antibodies to monitor activity. Overall I think this is a study that brings some new aspects and concepts that are important for the protein kinase field, in particular the allosteric regulation of the catalytic core by c-terminal segments, and how evolutionary cues generate more sophisticated mechanisms of allosteric control in protein kinases. However a revision would be recommended.

      Major Comments

      The authors explain in the introduction the role of T688 autophosphorylation site in the function of DCLK1.2. This site when phosphorylated have a detrimental impact on catalytic activity and inhibits phosphorylation of the DCX domain. allowing the interaction with microtubules. In the paper they show how this site is generated by alternative splicing and intron skipping in DCLK1.2. However there is no further functional evidence along the functional experiments presented in this study.

      1) What is the effect of a non-phosphorylable T688 mutant in terms of stability and enzymatic activity? What would be the impact of this mutant in the overall auto-phosphorylation reaction?

      The role of T688 phosphorylation on DCLK1 functions has been explored in previous studies (Agulto et al, 2020: PMID: 34310279), although only relevant to DCLK1.2 splice variants, since this site is lacking in DCLK1.1. These studies showed that mutation of T688 to an alanine increases total kinase autophosphorylation (ie autoactivity) and the subsequent phosphorylation of DCX domains, which in turn decreases microtubule binding. Given this information, our goal was to use an evolutionary perspective to investigate this, alongside less-well characterized aspects of DCLK autoregulation, including co-conserved residues in the catalytic domain and C-terminal tail. However, to address the reviewers question of a non-phosphorylatable T688 mutant, we performed MD simulations of T688A and T688E (a phosphomimic) mutant and include a new supplementary figure (Figure 5-supplement 3) which show the two mutants slightly destabilize the C-tail relative to wt (1 and 2 angstrom increase in RMSF for T688E and T688A respectively), but by themselves cannot dislodge the C-tail from the ATP binding pocket. Thus, other co-conserved interactions as revealed by our analysis, are likely to contribute to the autoregulation of the kinase domain by the C-terminal tail. We have incorporated these observations into the revised results section.

      Furthermore, to address the reviewer’s question in terms of site-specific autophosphorylation as a marker of DCLK1.2 activity, we have now performed a much-more detailed phosphoproteomic analysis of a panel of purified DCLK1.2 proteins after purification from E.coli (Figure 8-figure supplement 2). This showed that we are only able to detect Thr 688 phosphorylated in our ‘activated’ DCLK1.2 mutants, and not in the autoinhibited WT DCLK1.2 version of the protein. This apparent contradiction does not necessary discount Thr 688 as an important regulatory hotspot, but, together with the MD simulations, may imply a decreased contribution of pThr 688 in facilitating/maintaining DCLK1.2 auto-inhibition than previously anticipated, especially in the context of the numerous other stabilizing amino acid contacts that we describe between the C-tail and the ATP-binding pocket. We do, however, propose a mechanism for pThr688 as a potential ATP mimic based on MD analysis. However, we only found MS-based evidence for phosphorylation at this (and other sites in the same peptide) in highly active DCLK1.2 mutants, in which the C-tail remains uncoupled from the ATP-binding site, even in the presence of this regulatory PTM. We acknowledge that better understanding of DCLK biology will require a detailed appraisal of how the DCLK auto-inhibited states are subsequently physiologically regulated (PTMs, protein-protein interaction etc.), but this is beyond the scope of our current evolutionary investigation, and the absence of phosphospecific antibodies makes this challenging currently. We intend to expand upon our current work by assessing the relative contribution of multiple DCLK phosphorylation sites (including, but not limited to, Thr 688) with regard to cellular DCLK auto-regulation in future studies, in part by generating such site-specific phospho-antibodies.

      2) Have the authors made an equivalent T687/688 tanden in DCLK1.1 instead of the two prolines?

      This is a good point. We have not considered introducing a T687/688 tandem mutation into DCLK1.1 (at the equivalent position to that of DCLK1.2), primarily because the amino acid composition of their respective C-tail domains are so highly divergent across the tail (due to alternative splicing, as discussed in our paper). As discussed in our present study, there are numerous contacts made between specific amino acids in the regulatory C-tail and the kinase domain of DCLK1.2, which functionally occlude ATP binding, and thus change catalytic output. It is these contacts, which are determined by the specific amino acid sequence identity, and not the extended length of the DCLK1.2 C-tail per se, that drives autoinhibition. The alternate amino acid sequence identity of the C-tail of DCLK1.1 does not enable such contacts to form, which we believe explains the different activities of the two isoforms.

      Furthermore, our mutational analysis reveals clearly that Thr688 and several other sites are more highly autophosphorylated in the artificially activated DCLK1.2 constructs than WT DCLK1.2, and as such it remains our hypothesis that introduction of the tandem phosphorylation sites into DCLK1.1 is unlikely to be sufficient to impose an auto-inhibitory conformation of the enzyme.

      3) Could T688 autophosphorylation be used as a functional readout to evaluate DCLK1.2 activity?

      We agree with the reviewer’s suggestion about using autophosphorylation (including potentially Thr688 for DCLK1.2) as a functional read out for DCLK1 activity. In our present study, we identify phosphorylated peptides containing pThr688 only in the mutationally activated DCLK1.2 variants. We have now taken this analytical approach further and performed a detailed comparative phosphoproteomic characterisation of all of our DCLK1 constructs, where we observe marked differences in the overall phosphorylation profiles of the mutant DCLK1.2 (and DCLK1.1) proteins relative to the less phosphorylated WT DCLK1.2 kinase. This manifests as a depletion in the total number of confidently assigned phosphorylation sites within the kinase domain and C-tail of WT DCLK1.2, and also as a depletion in the abundance of phosphorylated peptides for a given site. To help visualise this, individual phosphorylation sites have been schematically mapped onto DCLK1, which has been included as a new extended supplementary figure (Figure 8-figure supplement 2). For comparative analysis of phosphosite abundance, we could only select peptides that could be directly compared between all mutants (identical amino acid sequences) and those found to be phosphorylated in all proteins (these are Ser660 and Thr438); these are now shown in figure supplement 2 as a table. These site occupancies follow what we see with respect to the increased catalytic activity between DCLK1.1 and DCLK1.2 mutants versus DCLK1.2. We also detect increased phosphorylation of DCLK1.1 and activated DCLK1.2 mutants in comparison to (autoinhibited) DCLK1.2, supporting the hypothesis that these mutants are relieving the autoinhibited conformation.

      4) What are the evidences of the here described c-terminal specific interactions to be intra-molecular rather than inter-molecular? Have the authors looked at the monodispersion and molecular mass in solution of the different protein evaluated in this study? Basically, are the proteins in solutions monomers or dimers/oligomers?

      Analysis of symmetry mates in the crystal structure of DCLK1.2 (PDB ID: 6KYQ) provide no evidence for inter-molecular interactions. Furthermore, to evaluate oligomerization status in solution, we conducted an analytical size exclusion chromatography (SEC) and our analysis reveals that both DCLK1.1 and DCLK1.2 predominantly exist as monomers in solution (Figure 3-Supplements 1-3). These results suggest that the C-terminal tail interactions are primarily intra-molecular.

      5) (Figure 3) Did the authors look at the mono-dispersion of the protein preparation? The sec profile did result in one single peak or multiple peaks? Could the authors show the chromatogram? how many species do you have in solution? Was the tag removed from the recombinant proteins or not?

      Yes, as mentioned above, the SEC profile resulted in a single peak for both DCLK1.1 and DCLK1.2, which was confirmed as DCLK1 by subsequent SDS-PAGE. We have included the chromatogram and gels in supporting data (Figure 3-supplements 1-3) in the revised manuscript and updated the Methods section. ‘The short N-terminal 6-His affinity tag present on all other DCLK1 proteins described in this paper was left in situ on recombinant proteins, since it does not appear to interfere with DSF, biochemical interactions or catalysis.’

      6) Authors should do Michaelis-Menten saturation kinetics as shown in Figure 3C with the WT when comparing all the functional variant analysed in the study. So we can compared the catalytic rates and enzymatic constants (depicted in a table also) kcat, Km and catalytic efficiency constants (kcat/Km)

      Thank you for your suggestion. We have performed the requested comparative kinetics analyses for selected functional DCLK1 variants at the same concentration as suggested, using our real-time assay to determine Vmax for peptide phosphorylation as a function of ATP, but at a fixed substrate concentration (we are unable to assess Vmax above 5 µM peptide for technical reasons). The results of these analyses have been included in the revised version of Figure 8-Supplement 1, where they support differences in both Vmax and Km[ATP]; the ratio of these values very clearly points to differences in activities falling into ‘low’ or ‘high’. This kinetic analysis fully supports our initial activity assays, where mutations predicted to uncouple the auto-inhibitory C-tail rescue DCLK1.2 activity to levels similar to DCLK1.1 towards a common substrate.

      Minor Comments

      It is very interesting how the IBS together with the pT688 mimics ATP in the case of DCLK1.2 to reach full occupancy of the active site. On Figure 8 you evaluate residues of the GRL and IBS interface to probe such interactions.

      1) Did the authors look at the T688 non-phosphorylable mutant?

      See our response to Major Comment 1 above. In addition, due to the absence of T688 in DCLK1.1, we did not look at the T688A mutant of DCLK1.2 biochemically, partially because it has been characterized in previous studies, but partially because this site is preceeded by another Thr residue. The lack of a selective antibody towards this site makes it difficult to evaluate the role of T688 phosphorylation specifically with respect to DCLK cellular functions and interactions. Therefore, we focused our in vitro efforts to understand how mutations in the IBS impact the catalytic activity of DCLK1.2 by comparing different variants to DCLK1.1.

      2) Classification of DCLK C-terminal regulatory elements.

      It would be useful to connect the different regulatory elements described in this study to a specific functional and biological setting where these different switches play a role e.g. microtubule interactions and dynamics, cell cycle, cancer, etc..

      While the primary focus of our paper is on the mechanism of allosteric regulation of DCLK1, we have indeed touched upon the potential implications of the various regulatory elements of the tail on functions such as microtubule binding and phenotypic effects like cancer progression. However, we acknowledge that a comprehensive understanding of these effects would necessitate a more detailed investigation. This could potentially involve the integration of RNA-seq data with extensive cell assays to evaluate phenotypic effects. We believe that such a future study would be a valuable extension of our current work and could provide further insights into the functional roles of DCLK1.

      3) (Figure 3) Could the authors explain the differences in yield between the WT and the D531A mutant. Apparently, it [the yield] does not appear to be caused by a lower stability as indicated by the Tm. Could the authors comment on this? It is important to compare different samples in parallel, in the same experiment and side by side. This applies to the thermal shift data comparing WT and a D531A mutant on panel D and also on panel C a comparison between WT and D531A as negative control should be shown.

      WT and D533A (kinase-dead) were indeed analysed in parallel, but have been split in two panels to make the data easier to interpret. The modest differences in yield is likely explained by experimental prep-to-prep variations. Our experience shows that many protein kinase yields vary between kinase and kinase-dead variants, likely due to bacterial toxicity related to enzyme activity. In regards to thermal stability, we would like to emphasize that Differential Scanning Fluorimetry (DSF) is to our mind a more informative and quantitative measure of protein stability than yield from bacteria, because both assess purified proteins at the same concentration. We believe that the DSF data provide a more accurate representation of the real stability differences between the WT and D533A mutant.  

    1. Background

      Ilan Gronau: This manuscript describes updates made to GADMA, which was published two years ago. GADMA uses likelihood-based demography inference methods as likelihood-computation engines, and replaces their generic optimization technique with a more sophisticated technique based on a genetic algorithm. The version of GADMA described in this manuscript has several important added features. It supports two additional inference engines, more flexible models, additional input and output formats, and it provides better values for the hyper-parameters used by the genetic algorithm. This is indeed a substantial improvement over the original version of GADMA. The manuscript clearly describes the different added features to GADMA, and then demonstrates them with a series of analyses. These analyses establish three main things: (1) they show that the new hyper-parameters improve performance; (2) they show how GADMA can be used to compare performance of different approaches to calculate data likelihood for demography inference; (3) showcase new features of GADMA (supporting model structure and inbreeding inference). Overall, the presentation is very clear and the results are interesting and compelling. Thus, despite being a publication about a method update, it shows substantial improvement, provides interesting new insights, and will likely lead to expansion of the user base for GADMA.The only major comment I have is about the part of the study that optimizes the hyperparameters. The hyper-parameter optimization is a very important improvement in GADMA2. The setup for this analysis is very good, with three inference engines, four data sets used for training and six diverse data sets used for testing. However, because of complications with SMAC for discrete hyperparameters, the analysis ends up considering six separate attempts. The comparison between the hyper-parameters produced by these six attempts is mostly done manually across data sets and inference engines. This somewhat beats the purpose of the well-designed set up. Eventually, it is very difficult for the reader to asses the expected improvement of the final suggested values of hyperparameters (attempt 2) to the default ones. I have two comments/suggestions about this part.First, I'm wondering if there is a formal way to compare the eventual parameters of the six attempts across the four training sets. I can see why you would need to run SMAC six separate times to deal with the discrete parameters. However, why do you not use the SMAC score to compare the final settings produced by these six runs?Second, as a reader, I would like to see a single table/figure summarizing the improvement you get using whatever hyper-parameters you end up suggesting in the end compared to the default setting used in GADMA1. This should cover all the inference engines and all the data sets somehow in one coherent table/figure. Using such a table/figure, you could report improvement statistics, such as the average increase in log-likelihood, or average decrease in convergence times. These important results get lost in the many improved figures and tables.These are my main suggestions for revisions of the current version. I also have some more minor comments that the authors may wish to consider in their revised version, which I list below.Introduction:===========para 2: the survey of demography inference methods focuses on likelihood-based methods, but there is a substantial family of Bayesian inference methods, such as MPP, Ima, and G-PhoCS. Bayesian methods solve the parameter estimation problem by Bayesian sampling. I admit that this is somewhat tangential to what GAMDA is doing, but this distinction between likelihood-based methods and Bayesian methods probably deserves a brief mention in the introduction.para 2,3: you mention a result from the original GADMA paper showing that GADMA improves on the optimization methods implemented by current demography inference methods. Readers of this paper might benefit of a brief summary of the improvement you were able to achieve using the original version of GADMA. Can you add 2-3 sentences providing the highlights of the improvement you were able to show in the first paper?para 3: The statement "GADMA separates two regular components" is not very clear. Can you rephrase to clarify?Materials and methods - Hyper-parameter optimization:==============================================I didn't fully understand what you use for the cost function in SMAC here. Seems to me like there are two criteria: accuracy and speed. You wish the final model to be as accurate as possible (high log likelihood), but you want to obtain this result with few optimization iterations. Can you briefly describe how these two objectives are addressed in your use of SMAC? It's also not completely clear how results from different engines and different data sets are incorporated into the SMAC cost. Can you provide more details about this in the supplement?para 2: "That eliminate three combinations" should be "This eliminates three combinations".para 3: "Each attempt is running" should be "Each attempt ran"para 3: "We take 200×number of parameters as the stop criteria". Can you clarify? Does this mean that you set the number of GADMA iterations to 200 times the number of demographic model parameters? Why should it be a linear function of the number of parameters? The following text explains the justification, butTable 1: I would merge Table S2 with this one (by adding the ranges of all hyper-parametres as a first column). It's important to see the ranges when examining the different selections.Materials and methods - Performance test of GADMA2 engines:=====================================================para 2: "ROS-STRUCT-NOMIG" should be "DROS-STRUCT-NOMIG" Also, "This notation could be read" - maybe replace by "This notation means" to signal that you're explaining the structure notation.Para 4 (describing comparisons for momi on Orangutan data): "ORAN-NOMIG model is compared with three …". You also consider ORAN-STRUCTNOMIG in the momi analysis, right?Results - Performance test of GADMA2 engines:========================================Inference for the Drosophila data set under model with migration: you mention that the models with migration obtain lower likelihoods than the models without migration. You cannot directly compare likelihoods in these two models, since the likelihood surface is not identical. So, I'm not sure that the fact that you get higher likelihoods in the models without migration is a clear enough indication for model fit. The fact that the inferred migration rates are low is a good indication for that. It also seems like despite converging to models with very low migration rates, the other parameters are inferred with higher noise. For example, the size of the European bottleneck is significantly increased in these inferences compared to that of the NOMIG. So, potentially the problem here is that more time is required for these complex models to converge.Inference for the Drosophila data set under structured model (2,1): the values inferred by moments and momentsLD appear to neatly fit the true values. However, it is not straightforward to compare an exponential increase in population size to an instantaneous increase. Maybe this can be done by some time-averaged population size, or the average time until coalescence in the two models? This will allow you to quantify how good the two exponential models fit the true model with instantaneous increase.Inference for the Orangutan data set under structured model (2,1) without migration: you argue that a constant population size is inferred for Bor by moments and momi because of the restriction on population sizes after the split. You base this claim on a comparison between the log-likelihoods obtained in this model (STRUCT-NOMIG) and the standard model (NOMIG) in which you add this restriction. I didn't fully understand how you can conclude from this comparison that the constant size inferred for Bor is due to the restriction on the initial population size after the split. I think what you need to do to establish this is run the STRUCT model without this restriction and see that you get exponential decrease. Can you elaborate more on your rationale? A detailed explanation should appear in the supplement and a brief summary in the main text.Inference for the Orangutan data set with models with pulse migration: This is a nice result showing that the more pulses you include, the better the estimates become. However, your main example in the main text uses the inferred migration rates. This is a poor example, because migration rates in a pulse model cannot be compared to rates in a continuous model. If migration is spread along a longer time range, then you expect the rates to decrease. So, there is no expectation of getting the same rates. You do expect, however, to get other parameters reasonably accurate. It seems like this is done with 7 pulses, but not so much with one pulse. This should be the main the focus of the discussion of these results.Results - inference of inbreeding coefficients:======================================When you describe the results you obtained for the cabbage data set, you say "the population size for the most recent epoch in our results is underestimated (6 vs 592 individuals) for model 1 without inbreeding and overestimated (174,960,000 vs. 215,000 individuals) for model 2 with inbreeding". The usage of under/overestimated is not ideal here, because it would imply that the original dadi estimates are more correct. You should probably simply say that they are lower/higher than estimates originally obtained by dadi. Or maybe even suggest that the original estimates were over/underestimated?Supplementary materials:=====================Page 4, para2: "Figure ??" should be "Figure S1"Page 4, para 4: Can you clarify what you mean by "unsupervised demographic history with structure (2, 1)"?Page 22, para 2: "Compared to dadi and moments engines momentsLD provide slightly worse approximations for migration rates". I don't really see this in Supplementary Table S16. Estimates seem to be very similar in all methods. Am I missing anything? You make the same statement again in the STRUCT-MIG model (page 23).Page 22, para 4: "The best history for the ORAN-NOMIG model with restriction on population sizes is -175,106 compared to 174,309 obtained for the ORAN-STRUCT-NOMIG mod". There is a missing minus sign before the second log likelihood. You should also specify that this refers to the moments engine. Also see comment above about this result.

  2. learn-us-east-1-prod-fleet01-xythos.content.blackboardcdn.com learn-us-east-1-prod-fleet01-xythos.content.blackboardcdn.com
    1. FreedomfortheFilipinoschallenging USoccupatio

      I think this is a really blunt message regarding the different meanings of freedom to people within different situations, and how the different ideas can conflict. It reveals that a differing idea of freedom can be seen as an attack on freedom accepted by someone else. In this case, the US soldiers see the Filipino's desire for independence as an infringement upon their ideas of freedom. It is up to society and each individual which side of the conflict they wish to be on. I think today, we can agree with the Filipinos side, but there may be some who think otherwise. Personally, I struggle to understand the soldiers, but their social development and ideologies are entirely different from my own. So, in that regard, historical circumstances play a big role in understanding the many complexities of freedom. It's very complex and quite hard to articulate.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Kagermeier et al. present a novel and interesting study that attempts to model a severe neurodevelopmental disorder, pontocerebellar hypoplasia type 2a, using neocortical and cerebellar organoids. Brain organoids are an appropriate and promising approach to elucidate disease mechanisms in neurodevelopmental diseases. The authors show a reduction in the size of the organoids which is more pronounced in the cerebellar compared to neocortical organoids. While this finding is interesting and reminiscent of the clinical PCH2a phenotype, i.e., cerebellar hypoplasia, the study is very preliminary and the conclusions of the manuscript are not supported by the data. Additional information and further experiments are necessary to support the claims made.

      Major concerns:

      1. hiPSC lines show considerable inter- and intra-individual variability and therefore the size differences observed between these control and patient-derived organoids may arise from differences in the hiPSC lines used. While the data sufficiently demonstrates the pluripotency of the multiple novel hiPSC lines, major concerns remain as to the appropriateness of the control hiPSC lines. The manuscript should include a table describing the age and sex matching as well as mode of reprogramming for all control and patient lines. Patient and control lines should be matched as closely as possible. Furthermore, figure legends should clearly indicate which clones and lines are shown in the various figure panels.

      We agree with the reviewer that hiPSC variability is an important concern in the field. In order to minimize such effects, all iPSCs lines used in this study were generated following the same protocol in the same lab. All cell lines are derived from male donors, thus, eliminating sex-based variability. Further, there is no report of sex-based variance in the clinical phenotype of PCH2a children and this finding is further corroborated by a currently on-going natural history study in our research team. While it would be ideal to also have age-matched controls, this is not possible for ethical reasons as skin biopsies from healthy children cannot easily be obtained to match the pediatric PCH2a cases. However, based on the literature, we believe that epigenetic age is erased upon reprogramming (Strassler et al 2018, Studer et al 2015). Following the reviewer’s recommendation, we provide a table that clearly indicates the origin of all six cell lines used (see Methods section) and information of respective lines was added to the figure legends as suggested by the reviewer.

      As the hiPSC lines used are not isogenic, it is important that the authors characterise these lines further. This should include a quantification of the rates proliferation and apoptosis in all used hiPSC lines, as these might impact the growth rate of the embryoid bodies / organoids.

      We thank the reviewer for raising this concern. To address the variability of hiPSC lines, we performed an extensive characterization of pluripotency, proliferation and cell cycle dynamics of all six hiPSC lines through immunocytochemistry against pluripotency marker OCT4, proliferation marker Ki-67 and EdU incorporation experiments. We further assessed the apoptosis rate of hiPSCs by staining against apoptotic marker cCas3. These experiments were carried out in three consecutive passages of all iPSC lines providing statistical power to the analyses. All experiments did not result in significant differences between PCH2a and control iPSC lines (see Figure 2).

      The authors state that the hiPSC lines have been characterised by SNP arrays to show that no genomic / chromosomal aberrations have been accrued due to reprogramming. The manuscript should include information as to when the SNP array was performed (i.e., immediately after reprogramming, after initial passaging, etc) and also include the results of the SNP array as additional information. What passage were the hiPSC when the presented experiments were carried out?

      In agreement with this comment, we provide data of SNP arrays that were performed to ensure the chromosomal integrity of all cell lines (see supplement). Further, we added details on passages of the cell lines in the respective figure legends as suggested by the reviewer. In brief, all cell lines were kept below passage 20 and were subjected to pluripotency testing before differentiations were started.

      Given that TSNE54 is broadly and strongly expressed in the developing nervous system, the very limited staining of the organoids for TSNE54 in Figure 2 is surprising. Can the authors provide an explanation for the fact that TSNE54 is only expressed in a small subset of cells? Which cell types are these? Moreover, high-magnification images should be shown to demonstrate subcellular staining pattern of TSNE54. Quantification of TSNE54 protein levels by immunoblotting would also be beneficial.

      Related to this observation, it is puzzling that the large size differences that the authors observe in their organoids would be driven by such a small number of TSNE54-expressing cells. How do the authors explain this discrepancy?

      We thank the reviewer for this comment. We have carefully assessed human cerebellar development transcriptomic datasets which demonstrate that TSEN54 is in fact not strongly but moderately expressed in the human developing nervous system. Additionally, TSEN54 expression is expressed in various different cell types (not limited to a subset of cell types) (Aldinger et al 2021, Sepp et al 2021). We agree with this reviewer and reviewer 3 that Western Blotting or other types of quantification would be informative as well as investigation of the subcellular localization of the protein. However, these questions go beyond the scope of the current manuscript, which aims to present a disease model. We have therefore decided to remove the characterization of TSEN54 expression in organoids from our revised manuscript.

      The generated organoids need to be better characterised with a broader range of markers using both qPCR and immunostaining. At the moment, their identity as "cortical" and "cerebellar" organoids remain unconvincing. This is particularly true for cerebellar organoids, which are challenging to generate and are not widely used. The authors should include additional markers (for example, see PMIDs 25640179, 29397531, 32117945) and immunostaining should clearly show expected staining patterns.

      In Figure 5, it appears that some markers (e.g., SATB2) are expressed differently between control and patient lines, yet this is not commented on by the authors who conclude that control and patient lines show differentiation into organoids.

      We thank the reviewer for this suggestion. We performed further immunostainings using the markers that were used in other cerebellar organoid papers (Muguruma et al 2015, Silva et al 2020, Watson et al 2018) as the reviewer suggested. In detail, we added immunohistochemistry experiments on Day 30 and Day 50 of differentiation for early Purkinje cell markers OLIG2 and SKOR2. We also included ATOH1 as a marker for rhombic lip-derived granule cells. For the neocortical organoids, we believe that the performed characterization is sufficient since the protocol we used is well-established and widely used as also indicated by the reviewer. We agree that the cellular composition of the organoids should be investigated in detail (for instance using single-cell transcriptomics). However, we believe this is out of the scope of this manuscript, which describes the establishment of a brain-region specific model platform.

      The authors attempt to look into a potential mechanism for the size differences observed between control and patient organoids. However, only cleaved caspase-3 is used as a marker for apoptosis and no differences were observed. The authors should include further markers for potential cell death. In addition, immunostaining for proliferation markers (i.e., KI67) should be performed to evaluate whether the difference in organoid size could stem from decreased proliferation rather than increased cell death.

      We agree with the reviewer and included a quantification of the proliferation marker Ki-67 within the SOX2 positive population of cerebellar and neocortical organoids as well as the quantification of SOX2 positive areas within the organoids (Figure 6). We observed significant differences in proliferation between PCH2a and control cerebellar organoids. Moreover, we also analyzed the morphology of organoids and quantified the thickness and number of rosettes and find significant differences between control and PCH2a cerebellar organoids corroborating the notion that proliferation is altered in cerebellar organoids. Neocortical organoids do not show any significant differences in proliferation and Sox2+ structures. Only the thickness of the Sox2+ areas is slightly decreased in neocortical PCH2a organoids compared to controls. In order to deepen our analysis of a possible increased apoptosis in PCH2a organoids, we also quantified cCas3 in Sox2+ structures (Figure 5) as also suggested by Reviewer 2. These analyses did not show any significant differences between PCH2a and control organoids. We therefore suggest that at the early stages of differentiation studied here, proliferative differences are the main reason for the size differences between PCH2a and control organoids.

      Reviewer #1 (Significance (Required)):

      The authors present an innovative approach to study neurodevelopmental disorders using brain organoids and should be of interest to researchers and clinicians working on neurodevelopmental diseases. However, the data presented are too limited to support any conclusions about the phenotype observed. Furthermore, questions remain about the used methodology and more work is needed to demonstrate the successful generation of both cortical and cerebellar organoids.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Please find enclosed my recommendation for the paper submitted by Kagermeier et al entitled' Human organoid model of PCH2a recapitulates brain region-specific pathology'. It describes the development of a human model for PCH2a and its characterization. My overall assessment of the paper is 'Major revision' which is explained below.

      Although the paper is very well written and clearly interesting in that it describes the generation and initial analyses of a human organoid model for PCH2a it should be revised such that it will proof the points it is trying to make. The authors are meticulous in their studies combining cellular characterization and a thorough initial screen of organoid (both cerebellar as well as cortical) integrity, yet hardly any mechanistic data is provided. Nevertheless, if the authors are able to add additional experiments and are able to address the points raised, the reviewer may be willing to consider a more positive outcome.

      Major concerns

      1) The overall quality of the figures is poor. There is a lot of overexposure such that often cellular or tissue structures are blended. It starts with Figure 1 G and H but can be observed throughout the manuscript. Deconvolution would greatly enhance their results.

      We are thankful for this comment and we have improved the quality of all microscopy images.

      2) Especially figure 4 and 5 could have been complemented with quantitative data. It furthermore seems more supplemental figure as these are just proof-of-principle stainings. No conclusions can be drawn from the panels except that all markers are there in the various conditions. And while they are showing a neural rosette in Fig 4A, just tiny ones can be observed in 4B. It is also not clear what the whole mount IHC ads in comparison to the IHC on sections. It is also strange that there is still a lot of SOX2 in the CALB/MAP2-positive area, but again with this magnification hard to appreciate.

      We agree with the reviewer that so far we presented qualitative proof-of-principle stainings that demonstrate cerebellar and neocortical differentiation, respectively. In order to address the comment of the reviewer, we improved the quality of the images and also provided higher magnification and enhanced resolution. Additionally, we now provide detailed quantifications of SOX2+ and Ki67+ neural progenitor cells and show that differences observed between PCH2a and control cerebellar organoids may explain the size differences observed between organoids (Figure 6). Our study provides the basis for more in-depth analysis of differences in differentiation and cell type composition between PCH2a and control organoids in the future, for example through single-cell RNAseq.

      3) If the authors would like to proof the point that cerebellar/cortical development is hampered, more functional assays could have been done. Nothing is analyses on the fraction of progenitor cells present (such as the percentage of Tbr2+ IPC in VZ/CP). Furthermore, if there is a suspicion that the number of cells is affected (which is also not shown), proliferation/cell cycle exit experiments using BrdU/EdU should have been performed. Early cell cycle exit still cannot be rules out and should have been tested by the combination of Ki67-/EdU+ percentage of a certain faction of progenitor cells (eg PAX6+ pool).

      We thank the reviewer for this valuable suggestion and agree that it would be interesting to carry out respective experiments. In this study, we show the establishment of a brain-region-specific organoid platform as a disease model for PCH2a and are only at the beginning of deciphering the underlying mechanism. In the revised manuscript, we quantified Ki-67+/Sox2+ cells in proliferative zones in the organoids. We believe that future studies including BrdU / EdU incorporation assays as well as scRNA-seq will answer the questions raised here and decipher the disease-causing mechanism on both cellular and molecular levels but are beyond the scope of this manuscript.

      4) Instead the author chose to only perform a cCas3 staining. From the panels in Figure 6 it is hard to appreciate which cells are actually cCas3+. Also the analyses were performed on the total pool of cell while it might have been more interesting to look for cell death of the various progenitor pools (eg the SOX2+ pool).

      We agree with the reviewer that a more in-depth analysis of apoptotic cell populations is interesting and performed cCas3/Sox2+ quantification for cerebellar and neocortical organoids. We did not observe significant differences of cCas3 expression within the SOX2+ cell population. (Figure 5)

      Minor concerns

      1) It would greatly enhance the review process if line numbers are added

      We have added line numbers to the manuscript.

      2) On general concepts (such as the generation of organoids in the context of disease) more references could have been added

      We have added more references and discussed the topic of brain organoids as disease models as suggested by this reviewer (Eichmüller & Knoblich 2022, Khakipoor et al 2020, Velasco et al 2020).

      Figures

      Fig. 1: In A, the square is clearly visible and not similar to B. An annotation of which is the control and which is the patient is missing in the figure. The arrows are hardly visibly, would make them slightly bigger and remove the black outer lining. Figure 1C can easily go to the Supplemental material. Fig 1 D is hard to appreciate the staining, a close-up with bright field microscope will help. E-I Most of the panels but especially G and H are overexposed. In J, it is hard to appreciate the TSEN54 staining. Maybe separate channels and a merge?

      We thank the reviewer for bringing these details to our attention. We have changed the arrows in the figure to enhance their visibility. Further we have adjusted the quality of the images overall. Lastly, we have made a comment in the figure legend clearly stating which scan came from which child. The described square was added to hide facial features of the imaged individuals hence they are not identical.

      Fig. 3: Usually go into the supplementals.

      Since organoid size is a major first readout when modeling a disorder that is characterized by a reduction of the volume of specific brain regions, we decided to keep this readout in the main text.

      Fig 4/5: Lack of quantitative data and poor quality of figures (overexposure).

      Fig 6: Many of the SOX2 panels are overexposed

      We thank the reviewer for the suggestions on the figures and addressed the concerns in the revised manuscript.

      CROSS-CONSULTATION COMMENTS

      I completely agree with reviewers #1 and #3. It is good to notice that we are overall on the same page.

      Reviewer #2 (Significance (Required)):

      The authors definitely made an excellent start to model PCH2a. Three controls and three patient lines are good to begin with but isogenic controls using one parental line and a patient line where the mutation is fixed would have been ideal. It is interesting that there seem to be a brain area specific pathology of the phenotype. Yet, more thorough analyses could have been performed such as proliferation and differentiation and cell cycle exit experiments. As for now the mostly descriptive data are only scratching the surface and little can be concluded on the molecular framework they are trying to solve.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this study Kagermeier et al. use human cerebellar and neocortical organoids to investigate the effects of the PCH2a-causing homozygous TSEN54c.919G>T variant on the neurodevelopment of different brain regions. They reveal a substantial growth defect in both neocortical and cerebellar regions with a more profound phenotype in the cerebellum. They continue to investigate major cell types of neurodevelopment in both regions and briefly potential mechanisms underlying the phenotypes. The study is well conceived and addresses the current gap of disease-modeling in cerebellar organoids; nevertheless, some major claims are not sufficiently substantiated in the current version. Below, I provide suggestions on how to improve the manuscript with some additional minor comments that might help with readability and accessibility of the work.

      Major comments:

      1. TSEN54 expression levels: The authors compare RNA and protein expression levels for TSEN54 to investigate the mutation's effect. For this the authors use qPCR on iPSCs and organoids of different age and immunostainings and conclude "we did not find differences in expression between cell and tissue types". There are some issues with this analysis as explained below:

      -The qPCR data (Fig. 2B) is first normalized to a housekeeping gene (GAPDH), however, then all organoid data are additionally normalized to the respective iPSC line. Thus, in case there is already a difference on iPSC level, this normalization might mask any difference in the organoids. It is unclear why this approach was chosen, and it seems more appropriate to show the data just normalized to GAPDH than additionally normalizing to the iPSCs, or at least to show first that iPSCs do not have differences in TSEN54 expression. Furthermore, even though apparently not statistically significant there seems to be a strong trend of lower TSEN54 levels in PCH2a in neocortical organoids, but even more so in cerebellar organoids. In my view this would fit very well with the study and should be further explored before concluding there is no statistical difference. Considering the high error bars of the cerebellar organoid samples, a higher N-number might be necessary to reach statistical significance in the difference in expression. Most importantly, it would be appropriate to show single data points where possible and to mark the different cell lines (as done in other figures), as otherwise it is not possible to judge whether there is a cell line bias in the data.

      -The evidence for protein expression of TSEN54 is immunofluorescence stainings for all conditions. As there is no quantification, the authors should not conclude differences, or the lack thereof, based on this qualitative data. Furthermore, in fact in the on example shown the PCH2a cerebellar condition (Fig 2D) seems to show lower expression levels compared with other conditions. This could be due to the selected image, as all other examples include large neural rosettes with strong staining in the center of the rosettes. Furthermore, it is unclear what cell line these stainings come from, even whether the PCH2a cerebellar and neocortical stainings come from the same cell line. Thus, the authors should select comparable examples for all conditions, and ideally provide staining examples (e.g., as supplementary data) for the other replicates to ensure expression in all replicates. If the authors want to comment on differences in protein expression, maybe a quantitative approach (e.g., quantitative western blot) would be more appropriate. Otherwise, the statements should be adjusted to not conclude whether TSEN54 protein levels differ or not.

      -Irrespective of the above comments the conclusion of the section "TSEN54 expression in cerebellar and neocortical organoids", that currently reads "we did not find differences in expression between cell and tissue types" should be changed, as the authors did not investigate whether there are cell type-specific differences of TSEN54 expression.

      We thank the reviewer for this comment. We agree that the provided data is not suitable for quantitative analysis of TSEN54 expression. Please also see our related response to the similar concern raised by reviewer 1. Thanks to these suggestions, we have decided to exclude the TSEN54 expression data from the current manuscript as a detailed analysis should be part of an extensive future study.

      Organoid growth analysis:

      The organoid growth analysis in Figure 3 and supplementary Figure 2 shows the main phenotype of the study that seems to be very strong. The authors use unpaired t-tests to compare within the different timepoints. Unfortunately, I think this approach might not be appropriate as even though the Welch correction does not rely on similar SDs in the compared groups (Control vs. PCH2a), it still assumes that all data points within each group share the same variance. However, this is not the case, as e.g., the control condition includes three groups (Control-1 to -3), that between groups might have different variance as such not all datapoints are independent from each other. Potentially ANOVA analyses controlling for cell line and timepoint might be more appropriate. Or additionally, the authors could consider using the linear regression analysis in Supplementary Figure 2 to further investigate the difference in organoid growth by e.g., comparing the slope of the regression lines. This might be more appropriately reflecting the growth deficit over time than simply comparing each timepoint individually. Expanding on this analysis the regression analysis requires some more information on the fit (intercept, slope, R-squared of the model), which would help clarifying the growth dynamics in the different systems and conditions.

      We thank the reviewer for the suggestions on statistical analysis and adjusted our approach accordingly. Briefly we performed 3-way-ANOVA analysis for the growth curves which revealed no significant differences between the different lines within the groups (Control or PCH2a) at different time points. Additionally, we added the linear regression model to the results (See Figure 3 and supplementary table 2, with the information on the curve fit).

      The growth ratio analysis (Figure 3D) is essential to the major claim of the paper that the organoids replicate the region-specific differences. As the authors performed all experiments with matching cell lines this could additionally strengthen the argument by generating the ratio of size differences for each cell line separately (instead of just for all PCH2a lines together). This would allow comparison of the same genetic background in both cerebellar and neocortical condition and further corroborate the region-specific severity of the phenotype. Potentially, this would also enable to test these differences statistically.

      We appreciate the suggestion to compare the differentiation protocols by line. Below we display the line-by-line analysis between the two differentiation protocols at D30 (A), D50 (B), and D90 (C). In order to visualize the differences in size between the two protocols more clearly, we have generated ratios of the average organoid sizes between neocortical and cerebellar organoids (D). The analysis corroborates our previous visualizations and statistics (3-way ANOVA) by showing that PCH lines produce neocortical and cerebellar organoids that differ in size more than those of control lines. The differences are most pronounced at D30 and D90. However, we believe that this analysis does not add additional value to our manuscript and have therefore decided not to include it in the revised version.

      Additionally, all growth analyses for the neocortical organoids (Figure 3C, Supplementary Figure 2B and C) seem to lack the PCH-1 cell line and only contain PCH-2 and PCH-3. This cell line should be added or commented on why it was excluded from the analyses.

      We agree with the reviewer. Unfortunately, we experienced contamination in that specific differentiation and therefore cannot provide the data. We have made a related comment in the manuscript. Since all differentiations were performed in parallel, adding this line at a later time point would add additional confounders and is therefore undesirable.

      Potential mechanism of the phenotype (apoptosis analysis):

      In Figure 6 the authors investigate the hypothesis that increased apoptosis contributes to the phenotypes. In the cleaved Caspase 3 staining there appear to be no differences. Unfortunately, the analysis apparently only includes one replicate (one organoid?) per cell line and condition. Considering the variability in the data shown this seems inappropriately low and should ideally contain ~3 replicates per cell line condition to judge technical and biological variability if the authors want to make the point that there is no "significant difference between PCH2a and control organoids at any time point in both cerebellar and neocortical organoids". Otherwise, this claim does not seem to be substantiated enough by the data.

      Finally, due to the absence of a phenotype related to apoptosis the authors conclude that the phenotypes may be due to "deficits in the proliferation of progenitor cells". Although this is mentioned in the introduction and the discussion, there is no evidence in the current study that supports this interesting idea. By adding relatively straight forward co-staining experiments for e.g., SOX2 (progenitors) and Ki67 (proliferating cells), the authors could provide further evidence for this hypothesis using existing organoid sections. This would support this speculative idea and could add a more mechanistic insight to the study, thereby making it more exciting.

      To address this concern, we have now added a table to the supplement that described in detail which organoids / batches / cell lines were used for which experiment (Supplementary table 3). In addition to our previous cCas3 quantifications, we performed the quantification of cCas3 within the population of SOX2-positive cells, which was suggested by Reviewer 2 (Figure 5).

      To assess the alternative hypothesis, that proliferation deficits account for the size differences observed between organoids, we also performed quantifications of SOX2-positive zones in the organoids at D30 and D50 of differentiation as well as quantifications of Ki-67 positive cells within the SOX2-positive population. For cerebellar organoids we found significant differences in these experiments (Figure 6). We believe that this data supports the hypothesis of aberrant proliferation in PCH2a cerebellar organoids explaining the size differences.

      Minor comments:

      • Cell line and quality control: The authors recruit three male patients with PCH2a and reprogram iPSCs. These cell lines are subjected to a well performed extensive quality control. However, it is unclear what cell lines the stainings (e.g., Fig. 1D to I) originate from. Furthermore, the supplementary qPCR analysis (Supplementary Figure 1) includes only the PCH-1 line, and additionally two cell lines that are not explained (F-CO and hESC-I3). It is unclear what the relevance of showing the qPCR of these cell lines is. To ensure proper QC for all used cell lines the authors should provide data for all cell lines (PCH-1 to -3 and control-1 to -3), or at least summarize (e.g., in a table) what QC metrics were applied to which cell line. Most importantly, this information is completely lacking for the control cell lines and the QC is just mentioned in the text. Unfortunately, it is unclear where the control cell lines originate from, and some basic information would be required to judge whether they are appropriate controls: are they iPSC or ESC, were they reprogrammed with a similar paradigm as the PCH2a cells, what is the gender of the control cell lines (all PCH2a cell lines are apparently male)?

      In line with a similar comment from reviewer 1, we have included a table that provides information on the origin of all six cell lines used in the revised manuscript (methods section). Further we provide SNP-Array data on all cell lines as supplementary material. We also performed detailed characterization of pluripotency, proliferation and cell cycle dynamics of all six hiPSC lines through immunocytochemistry against pluripotency marker OCT4, proliferation marker Ki-67 and EdU incorporation experiments (Figure 2). We further assessed the apoptosis rate of hiPSCs by staining against apoptotic marker cCas3. All experiments did not result in significant differences between PCH2a and control iPSC lines (see Figure 2). In line with the suggestion of this reviewer, we removed the qPCR analysis of iPSCs from the manuscript.

      • To make the study more approachable for a medical audience and to judge the variability in phenotype presentation among the recruited patients it would be appreciated if more information on the patients would be provided. The authors write: "We identified three individuals that display the genetic, clinical and brain imaging features previously described for PCH2a.". This information including age/date of birth, as well as other medically relevant information could be provided in the supplementary figure (e.g., is there a difference in disease burden among the different patients?). This would allow judging the recruited cohort better.

      We thank the reviewer for this insightful comment. We provided a table with detailed clinical information (supplementary table 1).

      • According to the method section the cerebellar and neocortical organoids were cultured in very different medium especially at later timepoints. While neocortical organoids were kept in a neural maintenance medium based on Neurobasal-A, cerebellar organoids were kept in a medium based on BrainPhys. These media contain very different levels of nutrients, especially of glucose (25mM vs 2.5mM, Bardy et al. 2015). This can have a strong phenotype on proliferation of progenitors and proliferative phenotypes (e.g., see Eichmüller et al. 2022). Especially as the authors claim that there is a difference in the PCH2a phenotypes between brain regions, it should be excluded that this is due to medium differences at later timepoints. When investigating the growth curves of Figure 3B and C it seems like the major difference in growth speed seems to be that neocortical organoids grow faster in early timepoints (We agree that media composition can greatly influence growth dynamics of cells in 2D and 3D. However, in this study we assess the differences between two groups: the PCH2a and control iPSC-derived organoids. The differences we describe are in relation to the respective control group and iPSCs were generated following the same protocol in the same lab. We believe that by following two protocols and comparing the three PCH2a to the three control lines within each protocol predominantly, we account for different media composition possibly changing growth dynamics.

      • Staining examples shown and presentation: In several figures the authors could improve the presentation of the staining examples with some changes:

      o Cell line information for images: as the authors only ever note the condition (PCH2a or Control) but not the cell line it is unclear if the stainings all come from one cell line or from multiple different cell lines. This prevents comparing the different differentiation conditions. Additionally, for major conclusions the authors should consider including supplemental stainings or further information on how reproducible the results shown are (how many cell lines and batches were used?).

      We thank the reviewer for these suggestions. We added information on cell lines and passages for all experiments shown in this study in the figure legends. Moreover, we also added a table providing information on n-numbers for all experiments (supplementary table 3).

      o Selection of examples: in several cases (Fig 2C/D, 4A, 6A/B) the selected images depict very different regions, e.g., one condition shows a large rosette, while in the other condition no rosette can be seen. It would be more appropriate to show matching examples where possible.

      We agree with the reviewer and have chosen matched regions of interest in the figure panels in the revised version of the manuscript. Please note that for cerebellar organoids we observed a significant difference in the timepoint of appearance of these rosette-like structures. Therefore, an exact matching of regions of interest was not possible due to biological differences between the samples, which we have also quantified (Figure 6).

      o Color code of stainings: Colors do not match throughout the manuscript in immunofluorescence images. E.g., Fig. 4 uses blue, green, red, magenta and Fig. 5 uses blue, green, magenta, cyan. It would be preferable to adhere to one color code. Considering significant fraction of the population is having red-green blindness, the latter color code seems more appropriate as it should ensure readability also for color-blind audiences.

      We are thankful for this comment. We changed the color code to make figures more widely accessible.

      • Small typos:

      o Figure 1 legend: last sentence "The" instead of "Th"

      o Supplementary Figure 1B: PCH-2 is named "PCH-22"

      o Supplementary Figure 2: As in the main figure for neocortical organoids the PCH-1 condition is missing (see comment on organoid growth curves). Additionally, the color/shape code of the plots in B does not always match the legend (e.g., size in left plot is different and color of PCH-3 in middle and left plot differs from legend and right plot).

      o It is unclear why the cortical organoids are referred to as "neocortical organoids" in the figures and the text. The methods and the reference in the methods as well as all major papers rather use the word "cortical".

      We addressed these suggestions and thank the reviewer for bringing these to our attention. Unfortunately, we could not include data on PCH-01 in neocortical differentiation due to a contamination in this batch. We made sure to run all the batches presented here in parallel so that all conditions are equivalent, preventing us from including a different batch at a later time point.

      We believe that in the context of our study, it is important to highlight cortical organoids as neocortical organoids, because we are also showing cerebellar organoids and there is also a cerebellar cortex.

      References:

      Bardy, C. et al. Neuronal medium that supports basic synaptic functions and activity of human neurons in vitro. Proc National Acad Sci 112, E3312 (2015).

      Eichmüller, O. L. et al. Amplification of human interneuron progenitors promotes brain tumors and neurological defects. Science 375, (2022).

      CROSS-CONSULTATION COMMENTS

      I agree with the comments of the other reviewers and as they are mostly matching, this reinforces the importance to improve certain aspects of the manuscript. As there are no deviating issues I do not comment specifically on any reviewer comments.

      Reviewer #3 (Significance (Required)):

      This work is using organoid technology to shed light on brain region-specific phenotypes in PCH2a. Brain organoids have drastically changed the way we study human neurological diseases (Eichmüller and Knoblich 2022), however, most brain organoid research has focused on cortical organoids. Cerebellar organoid protocols exist for some time (Muguruma et al. 2015, Silva et al. 2020, Nayler et al. 2021) but were not yet applied to uncover new disease biology. Especially considering the important role of human-specific cerebellar processes in specific developmental disorders (Haldipur et al. 2021) and cancer (Hendrikse et al. 2022, Smith et al. 2022), disease modeling in human cerebellar organoids holds great potential for understanding disease biology. The work by Kagermeier et al. demonstrates that human cerebellar organoids are recapitulating brain region-specific growth deficits and thus is an important step forward for disease modeling. Therefore, this work will be interesting to researchers working on brain development and disease modeling, especially in in-vitro systems. Nevertheless, the mechanistic insight of the study is limited, as is the insight into how human-specific processes might be involved in the pathogenesis of PCH2a. Therefore, it will be interesting how this disease model will be used in future to investigate the cell types and mechanisms involved in the PCH2a phenotype.

      Personal field of expertise: Brain organoids and disease modeling in organoids especially of neurodevelopmental diseases. Analysis of organoids with stainings, as well as sequencing techniques, and bioinformatics.

      References:

      Eichmüller, O. L. & Knoblich, J. A. Human cerebral organoids - a new tool for clinical neurology research. Nat Rev Neurol 1-20 (2022) doi:10.1038/s41582-022-00723-9.

      Haldipur, P. et al. Evidence of disrupted rhombic lip development in the pathogenesis of Dandy-Walker malformation. Acta Neuropathol 142, 761-776 (2021).

      Hendrikse, L. D. et al. Failure of human rhombic lip differentiation underlies medulloblastoma formation. Nature 609, 1021-1028 (2022).

      Muguruma, K., Nishiyama, A., Kawakami, H., Hashimoto, K. & Sasai, Y. Self-Organization of Polarized Cerebellar Tissue in 3D Culture of Human Pluripotent Stem Cells. Cell Reports 10, 537-550 (2015).

      Nayler, S., Agarwal, D., Curion, F., Bowden, R. & Becker, E. B. E. High-resolution transcriptional landscape of xeno-free human induced pluripotent stem cell-derived cerebellar organoids. Sci Rep-uk 11, 12959 (2021).

      Silva, T. P. et al. Scalable Generation of Mature Cerebellar Organoids from Human Pluripotent Stem Cells and Characterization by Immunostaining. J Vis Exp (2020) doi:10.3791/61143.

      Smith, K. S. et al. Unified rhombic lip origins of group 3 and group 4 medulloblastoma. Nature 609, 1012-1020 (2022).

      References by the authors

      Aldinger KA, Thomson Z, Phelps IG, Haldipur P, Deng M, et al. 2021. Spatial and cell type transcriptional landscape of human cerebellar development. Nat Neurosci 24: 1163-75

      Eichmüller OL, Knoblich JA. 2022. Human cerebral organoids — a new tool for clinical neurology research. Nature Reviews Neurology 18: 661-80

      Khakipoor S, Crouch EE, Mayer S. 2020. Human organoids to model the developing human neocortex in health and disease. Brain Res 1742: 146803

      Muguruma K, Nishiyama A, Kawakami H, Hashimoto K, Sasai Y. 2015. Self-organization of polarized cerebellar tissue in 3D culture of human pluripotent stem cells. Cell Rep 10: 537-50

      Sepp M, Leiss K, Sarropoulos I, Murat F, Okonechnikov K, et al. 2021.

      Silva TP, Fernandes TG, Nogueira DES, Rodrigues CAV, Bekman EP, et al. 2020. Scalable Generation of Mature Cerebellar Organoids from Human Pluripotent Stem Cells and Characterization by Immunostaining. J Vis Exp

      Strassler ET, Aalto-Setala K, Kiamehr M, Landmesser U, Krankel N. 2018. Age Is Relative-Impact of Donor Age on Induced Pluripotent Stem Cell-Derived Cell Functionality. Front Cardiovasc Med 5: 4

      Studer L, Vera E, Cornacchia D. 2015. Programming and Reprogramming Cellular Age in the Era of Induced Pluripotency. Cell Stem Cell 16: 591-600

      Velasco S, Paulsen B, Arlotta P. 2020. 3D Brain Organoids: Studying Brain Development and Disease Outside the Embryo. Annu Rev Neurosci 43: 375-89

      Watson LM, Wong MMK, Vowles J, Cowley SA, Becker EBE. 2018. A Simplified Method for Generating Purkinje Cells from Human-Induced Pluripotent Stem Cells. Cerebellum 17: 419-27

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary: In this study Kagermeier et al. use human cerebellar and neocortical organoids to investigate the effects of the PCH2a-causing homozygous TSEN54c.919G>T variant on the neurodevelopment of different brain regions. They reveal a substantial growth defect in both neocortical and cerebellar regions with a more profound phenotype in the cerebellum. They continue to investigate major cell types of neurodevelopment in both regions and briefly potential mechanisms underlying the phenotypes. The study is well conceived and addresses the current gap of disease-modeling in cerebellar organoids; nevertheless, some major claims are not sufficiently substantiated in the current version. Below, I provide suggestions on how to improve the manuscript with some additional minor comments that might help with readability and accessibility of the work.

      Major comments: 1. TSEN54 expression levels: The authors compare RNA and protein expression levels for TSEN54 to investigate the mutation's effect. For this the authors use qPCR on iPSCs and organoids of different age and immunostainings and conclude "we did not find differences in expression between cell and tissue types". There are some issues with this analysis as explained below: -The qPCR data (Fig. 2B) is first normalized to a housekeeping gene (GAPDH), however, then all organoid data are additionally normalized to the respective iPSC line. Thus, in case there is already a difference on iPSC level, this normalization might mask any difference in the organoids. It is unclear why this approach was chosen, and it seems more appropriate to show the data just normalized to GAPDH than additionally normalizing to the iPSCs, or at least to show first that iPSCs do not have differences in TSEN54 expression. Furthermore, even though apparently not statistically significant there seems to be a strong trend of lower TSEN54 levels in PCH2a in neocortical organoids, but even more so in cerebellar organoids. In my view this would fit very well with the study and should be further explored before concluding there is no statistical difference. Considering the high error bars of the cerebellar organoid samples, a higher N-number might be necessary to reach statistical significance in the difference in expression. Most importantly, it would be appropriate to show single data points where possible and to mark the different cell lines (as done in other figures), as otherwise it is not possible to judge whether there is a cell line bias in the data. -The evidence for protein expression of TSEN54 is immunofluorescence stainings for all conditions. As there is no quantification, the authors should not conclude differences, or the lack thereof, based on this qualitative data. Furthermore, in fact in the on example shown the PCH2a cerebellar condition (Fig 2D) seems to show lower expression levels compared with other conditions. This could be due to the selected image, as all other examples include large neural rosettes with strong staining in the center of the rosettes. Furthermore, it is unclear what cell line these stainings come from, even whether the PCH2a cerebellar and neocortical stainings come from the same cell line. Thus, the authors should select comparable examples for all conditions, and ideally provide staining examples (e.g., as supplementary data) for the other replicates to ensure expression in all replicates. If the authors want to comment on differences in protein expression, maybe a quantitative approach (e.g., quantitative western blot) would be more appropriate. Otherwise, the statements should be adjusted to not conclude whether TSEN54 protein levels differ or not. -Irrespective of the above comments the conclusion of the section "TSEN54 expression in cerebellar and neocortical organoids", that currently reads "we did not find differences in expression between cell and tissue types" should be changed, as the authors did not investigate whether there are cell type-specific differences of TSEN54 expression.

      1. Organoid growth analysis: The organoid growth analysis in Figure 3 and supplementary Figure 2 shows the main phenotype of the study that seems to be very strong. The authors use unpaired t-tests to compare within the different timepoints. Unfortunately, I think this approach might not be appropriate as even though the Welch correction does not rely on similar SDs in the compared groups (Control vs. PCH2a), it still assumes that all data points within each group share the same variance. However, this is not the case, as e.g., the control condition includes three groups (Control-1 to -3), that between groups might have different variance as such not all datapoints are independent from each other. Potentially ANOVA analyses controlling for cell line and timepoint might be more appropriate. Or additionally, the authors could consider using the linear regression analysis in Supplementary Figure 2 to further investigate the difference in organoid growth by e.g., comparing the slope of the regression lines. This might be more appropriately reflecting the growth deficit over time than simply comparing each timepoint individually. Expanding on this analysis the regression analysis requires some more information on the fit (intercept, slope, R-squared of the model), which would help clarifying the growth dynamics in the different systems and conditions. The growth ratio analysis (Figure 3D) is essential to the major claim of the paper that the organoids replicate the region-specific differences. As the authors performed all experiments with matching cell lines this could additionally strengthen the argument by generating the ratio of size differences for each cell line separately (instead of just for all PCH2a lines together). This would allow comparison of the same genetic background in both cerebellar and neocortical condition and further corroborate the region-specific severity of the phenotype. Potentially, this would also enable to test these differences statistically. Additionally, all growth analyses for the neocortical organoids (Figure 3C, Supplementary Figure 2B and C) seem to lack the PCH-1 cell line and only contain PCH-2 and PCH-3. This cell line should be added or commented on why it was excluded from the analyses.

      2. Potential mechanism of the phenotype (apoptosis analysis): In Figure 6 the authors investigate the hypothesis that increased apoptosis contributes to the phenotypes. In the cleaved Caspase 3 staining there appear to be no differences. Unfortunately, the analysis apparently only includes one replicate (one organoid?) per cell line and condition. Considering the variability in the data shown this seems inappropriately low and should ideally contain ~3 replicates per cell line condition to judge technical and biological variability if the authors want to make the point that there is no "significant difference between PCH2a and control organoids at any time point in both cerebellar and neocortical organoids". Otherwise, this claim does not seem to be substantiated enough by the data. Finally, due to the absence of a phenotype related to apoptosis the authors conclude that the phenotypes may be due to "deficits in the proliferation of progenitor cells". Although this is mentioned in the introduction and the discussion, there is no evidence in the current study that supports this interesting idea. By adding relatively straight forward co-staining experiments for e.g., SOX2 (progenitors) and Ki67 (proliferating cells), the authors could provide further evidence for this hypothesis using existing organoid sections. This would support this speculative idea and could add a more mechanistic insight to the study, thereby making it more exciting.

      Minor comments: - Cell line and quality control: The authors recruit three male patients with PCH2a and reprogram iPSCs. These cell lines are subjected to a well performed extensive quality control. However, it is unclear what cell lines the stainings (e.g., Fig. 1D to I) originate from. Furthermore, the supplementary qPCR analysis (Supplementary Figure 1) includes only the PCH-1 line, and additionally two cell lines that are not explained (F-CO and hESC-I3). It is unclear what the relevance of showing the qPCR of these cell lines is. To ensure proper QC for all used cell lines the authors should provide data for all cell lines (PCH-1 to -3 and control-1 to -3), or at least summarize (e.g., in a table) what QC metrics were applied to which cell line. Most importantly, this information is completely lacking for the control cell lines and the QC is just mentioned in the text. Unfortunately, it is unclear where the control cell lines originate from, and some basic information would be required to judge whether they are appropriate controls: are they iPSC or ESC, were they reprogrammed with a similar paradigm as the PCH2a cells, what is the gender of the control cell lines (all PCH2a cell lines are apparently male)?

      • To make the study more approachable for a medical audience and to judge the variability in phenotype presentation among the recruited patients it would be appreciated if more information on the patients would be provided. The authors write: "We identified three individuals that display the genetic, clinical and brain imaging features previously described for PCH2a.". This information including age/date of birth, as well as other medically relevant information could be provided in the supplementary figure (e.g., is there a difference in disease burden among the different patients?). This would allow judging the recruited cohort better.

      • According to the method section the cerebellar and neocortical organoids were cultured in very different medium especially at later timepoints. While neocortical organoids were kept in a neural maintenance medium based on Neurobasal-A, cerebellar organoids were kept in a medium based on BrainPhys. These media contain very different levels of nutrients, especially of glucose (25mM vs 2.5mM, Bardy et al. 2015). This can have a strong phenotype on proliferation of progenitors and proliferative phenotypes (e.g., see Eichmüller et al. 2022). Especially as the authors claim that there is a difference in the PCH2a phenotypes between brain regions, it should be excluded that this is due to medium differences at later timepoints. When investigating the growth curves of Figure 3B and C it seems like the major difference in growth speed seems to be that neocortical organoids grow faster in early timepoints (<d30), but similar at later timepoints, which would exclude effects of the media at late timepoints. Nevertheless, considering the strong effect media glucose concentration can have the authors should investigate whether there is an effect at growth speed at later timepoints by comparing control organoids. This could also strengthen the region-specific phenotype due to PCH2a.

      • Staining examples shown and presentation: In several figures the authors could improve the presentation of the staining examples with some changes: o Cell line information for images: as the authors only ever note the condition (PCH2a or Control) but not the cell line it is unclear if the stainings all come from one cell line or from multiple different cell lines. This prevents comparing the different differentiation conditions. Additionally, for major conclusions the authors should consider including supplemental stainings or further information on how reproducible the results shown are (how many cell lines and batches were used?). o Selection of examples: in several cases (Fig 2C/D, 4A, 6A/B) the selected images depict very different regions, e.g., one condition shows a large rosette, while in the other condition no rosette can be seen. It would be more appropriate to show matching examples where possible. o Color code of stainings: Colors do not match throughout the manuscript in immunofluorescence images. E.g., Fig. 4 uses blue, green, red, magenta and Fig. 5 uses blue, green, magenta, cyan. It would be preferable to adhere to one color code. Considering significant fraction of the population is having red-green blindness, the latter color code seems more appropriate as it should ensure readability also for color-blind audiences.

      • Small typos: o Figure 1 legend: last sentence "The" instead of "Th" o Supplementary Figure 1B: PCH-2 is named "PCH-22" o Supplementary Figure 2: As in the main figure for neocortical organoids the PCH-1 condition is missing (see comment on organoid growth curves). Additionally, the color/shape code of the plots in B does not always match the legend (e.g., size in left plot is different and color of PCH-3 in middle and left plot differs from legend and right plot). o It is unclear why the cortical organoids are referred to as "neocortical organoids" in the figures and the text. The methods and the reference in the methods as well as all major papers rather use the word "cortical".

      References: Bardy, C. et al. Neuronal medium that supports basic synaptic functions and activity of human neurons in vitro. Proc National Acad Sci 112, E3312 (2015). Eichmüller, O. L. et al. Amplification of human interneuron progenitors promotes brain tumors and neurological defects. Science 375, (2022).

      CROSS-CONSULTATION COMMENTS I agree with the comments of the other reviewers and as they are mostly matching, this reinforces the importance to improve certain aspects of the manuscript. As there are no deviating issues I do not comment specifically on any reviewer comments.

      Significance

      This work is using organoid technology to shed light on brain region-specific phenotypes in PCH2a. Brain organoids have drastically changed the way we study human neurological diseases (Eichmüller and Knoblich 2022), however, most brain organoid research has focused on cortical organoids. Cerebellar organoid protocols exist for some time (Muguruma et al. 2015, Silva et al. 2020, Nayler et al. 2021) but were not yet applied to uncover new disease biology. Especially considering the important role of human-specific cerebellar processes in specific developmental disorders (Haldipur et al. 2021) and cancer (Hendrikse et al. 2022, Smith et al. 2022), disease modeling in human cerebellar organoids holds great potential for understanding disease biology. The work by Kagermeier et al. demonstrates that human cerebellar organoids are recapitulating brain region-specific growth deficits and thus is an important step forward for disease modeling. Therefore, this work will be interesting to researchers working on brain development and disease modeling, especially in in-vitro systems. Nevertheless, the mechanistic insight of the study is limited, as is the insight into how human-specific processes might be involved in the pathogenesis of PCH2a. Therefore, it will be interesting how this disease model will be used in future to investigate the cell types and mechanisms involved in the PCH2a phenotype.

      Personal field of expertise: Brain organoids and disease modeling in organoids especially of neurodevelopmental diseases. Analysis of organoids with stainings, as well as sequencing techniques, and bioinformatics.

      References:

      Eichmüller, O. L. & Knoblich, J. A. Human cerebral organoids - a new tool for clinical neurology research. Nat Rev Neurol 1-20 (2022) doi:10.1038/s41582-022-00723-9.

      Haldipur, P. et al. Evidence of disrupted rhombic lip development in the pathogenesis of Dandy-Walker malformation. Acta Neuropathol 142, 761-776 (2021).

      Hendrikse, L. D. et al. Failure of human rhombic lip differentiation underlies medulloblastoma formation. Nature 609, 1021-1028 (2022).

      Muguruma, K., Nishiyama, A., Kawakami, H., Hashimoto, K. & Sasai, Y. Self-Organization of Polarized Cerebellar Tissue in 3D Culture of Human Pluripotent Stem Cells. Cell Reports 10, 537-550 (2015).

      Nayler, S., Agarwal, D., Curion, F., Bowden, R. & Becker, E. B. E. High-resolution transcriptional landscape of xeno-free human induced pluripotent stem cell-derived cerebellar organoids. Sci Rep-uk 11, 12959 (2021).

      Silva, T. P. et al. Scalable Generation of Mature Cerebellar Organoids from Human Pluripotent Stem Cells and Characterization by Immunostaining. J Vis Exp (2020) doi:10.3791/61143.

      Smith, K. S. et al. Unified rhombic lip origins of group 3 and group 4 medulloblastoma. Nature 609, 1012-1020 (2022).

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all three Reviewers for their thorough assessment of our manuscript and their constructive comments and suggestions.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors generate several variants of actin that are internally tagged with short peptide tags. They identify one particular position that is able to tolerate various tags of 5-10 amino acids and still shows largely unaltered behavior in cells. They study incorporation of their tagged actins into filaments, characterize the interactions of G-actin variants with different associated proteins and show that retrograde actin flow in lamellipodia and the wound healing response of epithelial cells is not affected by the tagged variants. They then apply the tagged actin to study subcellular distribution of different actin isoforms in mammalian and yeast cells.

      The identification of a specific site in the actin protein that tolerates variable peptide insertions is very exciting and of fundamental interest for all research fields that deal with cytoskeletal rearrangements and cellular morphogenesis. The result demonstrating the functionality of actin variants with peptides inserted between aa 229 and 230 are generally convincing and well done. In particular, the generation of CRISPR/Cas9 genome edited versions of beta- and gamma actin are impressive. I therefore generally support publication of this study. There are however several technical and conceptual issues that should be addressed to improve quality and scope of the study. I listed some specific comments below:

      We thank the Reviewer for their constructive comments and general support for publication of our study.

      Major points

      - The biggest issue I have is the last section on the application of tagged actins to study isoform functions. In principle the application is very clear as there are simply no alternative ways to study isoform distribution in live cells. However, the experimental data are simply not convincing. What the authors define as "cortex" in Fig. 5A seems to rather represent cytosolic background mixed with radial fibers. I am not convinced that even the antibody staining with a relatively clear differential distribution of beta and gamma really shows a genuine accumulation of one isoform on stress fibers. It seems to me that the beta-actin staining has as higher cytosolic background and is generally weaker (gamma nicely labels transverse arcs), which reduces signal/noise and therefore yields a relatively increased level in areas with less-bundled actin. My suggestion is to select more clearly defined actin structures and to use micro-patterned cells to normalize the otherwise obstructing variability in actin organization. Possible structures would be cortical arcs in bow-shaped cells, lamellipodial edges (HT1080 seem to make very nice and large lamellipodia) or cell-cell contacts (confluent monolayer, provided cells don´t grow on top of each other). Stress fibers are possible but need to be segmented very precisely and I did not see any details on this in the methods section. For Fig. 5D: I assume cells were used where only one isoform was tagged? This is technical weak and the double-normalization is probably blurring any difference that might be occurring. Why not use a double-tagging strategy with ALFA/FLAG or ALFA/AU5 tags to exploit the constructs introduced in the previous figures? Also, the unique selling point of the strategy is the possibility of actual live imaging of specific isoforms. Cells that have stably integrated double tags and then transiently express nanobodies for ALFA and either AU5 or FLAG (or other if those don't exist) would make this possible. Considering the work already done in this manuscript, such an approach should actually be possible - did the authors attempt this or is there are reason it is not discussed? If double tagged cells are not possible for some reason it should at the very least be possible to combine ALFA-detection with the specific antibody against the other isoform and get rid of the double normalization.

      We thank the Reviewer for the various suggestions regarding the comparison between the localization of the tagged and native isoforms. In our reply below, we will separately discuss the possibilities and our considerations for fixed samples and live cell imaging. We apologize for the lengthy response but for transparency reasons, we would like to give a thorough overview of our efforts for isoform-specific localization in cells, something for which we have limited space in the manuscript.

      Fixed samples:

      It was a significant experimental challenge to comparing the labeling of the β- and γ-actin specific antibodies with our internally tagged actin system (Fig. 5A-D). The reason for this is that the labeling of the samples with the β- and γ-actin specific antibodies requires treatment with methanol (Dugina et al., J Cell Sci, 2009), most likely to disturb the interaction of actin with actin-binding proteins that prevent the binding of the antibodies due to steric hindrance. Methanol treatment, however, precludes the co-labeling with phalloidin, likely due to changes in the tertiary/quaternary protein structure of F-actin. Initially, we have put a lot of effort in trying to simultaneously label phalloidin with the actin specific antibodies but even very brief methanol treatment (seconds), before or after phalloidin labeling, completely prevents/reverses the binding of phalloidin. Importantly, also the ALFA tag labeling was suboptimal after methanol treatment.

      The fact that we could not perform these double labelings led us to perform different ratio calculations for the β- and γ-actin antibody and the ALFA tag labeling. In the case of the antibody immunofluorescence labeling, we simply divided the signal of the β-actin and γ-actin since we could simultaneously label the isoforms in the same cell. In the case of the ALFA tag labeling, we used phalloidin for independent signal normalization and then performed a second normalization. Although this complicates the normalization procedure (ALFA tag signal of β- and γ-actin is first normalized to total F-actin and then a ratio is calculated) and understandably leads to some confusion, this was the only way forward to obtain the results presented in the manuscript.

      The Reviewer points out that “What the authors define as "cortex" in Fig. 5A seems to rather represent cytosolic background mixed with radial fibers.”. In our images, we observe very little cytosolic background from both antibody stainings. More importantly, for the quantitative analysis, the fluorescence intensity values were corrected for the background values observed in cytosolic areas so even if the signal is present, it should not affect our analysis. We do admit though that we could have been more careful with the term “cortex” since the observed signal could indeed be a mix of radial fibers and the actin cortex. The reviewer further states that “I am not convinced that even the antibody staining with a relatively clear differential distribution of beta and gamma really shows a genuine accumulation of one isoform on stress fibers.” Although the differences are small, we consistently observe a differential fluorescence intensity of β- and γ-actin in actin-based structures with a relatively stronger signal of γ-actin in stress fibers (Fig. 5C). Since we always normalize the fluorescent signal intensity per cell, this strongly indicates a genuine accumulation of one isoform over the other in specific actin-based structures. This observation is very consistent in our experiments and also aligns with many published studies where differences in the localization of β- and γ-actin are reported in various cell types (Pasquier et al., Vasc Cell, 2015; van den Dries et al., Nat Comms, 2019; Malek et al., Int J Mol Sci, 2020). As for the segmentation, we mentioned in the Methods section that we selected small regions (0.5x0.5mm) that exclusively contain stress fiber or “cortex” regions. The regions shown in Fig. 5B are therefore larger than the analyzed regions, something which we will better indicate in the revised manuscript.

      Planned revision: We will provide a more detailed explanation of our quantitative analysis in the Methods section such that it is more clear how our normalization procedure was performed. Furthermore, we will adapt Fig. 5A-B such that it better visualizes how we defined the regions for quantification. As per the Reviewer’s suggestion, we will also apply a different experimental method to show that the tagged isoforms properly localize to actin-based structures. For this, we will attempt to use micropatterned cells to induce clearly define actin-bases structures (the crossbows as suggested by the Reviewer) and also explore the possibilities of investigating the differential localization in double-tagged cells. We will also reconsider the use of the term “cortex” for the region that is pointed out in Fig. 5A-B.

      Live cell imaging:

      We agree with the Reviewer that it would be very valuable to attempt simultaneous live cell imaging of two isoforms. Yet, for this, we would need two tag/fluorophore systems that allow the visualization of internally tagged isoforms in living cells. As presented in our original manuscript, we have successfully inserted many different epitope tags (FLAG/AU1/AU5/ALFA) in the T229/A230 position to demonstrate the versatility of our tagging approach. Yet, despite significant efforts to identify a second tag/fluorophore system that would allow isoform-specific live cell imaging, we only succeeded in designing one strategy to perform live cell imaging, i.e. with the ALFA tag (Götzke, Nat Comms, 2019). Part of the reason for this is that so far, no high affinity nanobodies have been generated against the classical epitope tags (FLAG, AU5 etc.). This is an established challenge since classical epitope tags are typically linear/unstructured while nanobodies require folded secondary structures for epitope recognition such as alpha helices (the ALFA tag was specifically designed as such).

      Besides the successful ALFA tag approach we have tried the following additional approaches for live cell imaging: 1) __full-length GFP, 2) full-length GFP with linker, 3) GFP11 (to complement with GFP1-10 (Cabantous et al., Nat Biotech, 2005) 4) GFP11 with linker 5) FLAG Frankenbodies (Zhao et al., Nat Comms, 2019; Liu et al., Genes Cells, 2021) in FLAG IntAct cells and 6) __Tetracysteine/FlAsH labeling. Importantly, each of these additional internally tagged actins, except for those that contained full-length GFP, showed a high colocalization with the cytoskeleton, again demonstrating the versatility of the T229/A230 position to tag actin. Unfortunately, none of these approaches satisfactorily visualized the actin isoforms in living cells. We will therefore briefly summarize our findings here.

      (1-2, integration of full-length GFP and GFP with linker) Probably not surprisingly, but integrating the entire coding sequence of GFP or GFP flanked by linkers (each 5AA in length) within the T229/A230 position did not results in a proper localization of actin.

      (3-4, integration of GFP11 and GFP11 with linker) Next, we assessed the localization of the GFP11 tagged actin versions (GFP11: 16AA, GFP11+linker: 26AA). Because GFP11 is not visible without GFP1-10 complementation, we also tagged actin at the N-terminus simply for proof of concept where the internally tagged actins would end up. Interestingly, both GFP11-actin and GFP11+linker-actin properly integrated within the cytoskeleton as demonstrated by the FLAG staining. This again demonstrates the versatility of the T229/A230 position and strongly suggests that even the integration of 26AA within this position does only minimally affect the polymerization of actin into the cytoskeleton.

      (3-4) After confirmation of the proper integration of GP11-actin and GFP+linker-actin we continue to express the GFP1-10 in these cells. Unfortunately, this resulted in no or only very minimal localization of the actin to the cytoskeleton, demonstrating that GFP-complementation hampers the integration into the cytoskeleton.

      (5, use of FLAG Frankenbodies) We also expressed FLAG Frankenbodies into our FLAG IntAct cells in an attempt to visualize the isoforms in living cells. FLAG Frankenbodies are single chain antibodies fused to GFP and can be expressed in cells to visualize FLAG-tagged proteins (Liu et al., Genes Cells, 2021). Although a cytoskeletal labeling was indeed discernable in some cells, the FLAG Frankenbody signal overlapped much less with the total actin signal as compared to the FLAG immunofluorescence labeling, indicating that the incorporation of the FLAG-tagged actin was much less in the presence of the FLAG Frankenbody. Also, a significant fraction of the cells demonstrated a homogenous cytosolic signal.

      (6, Use of tetracysteine/FlAsH) Although the tetracysteine tag/FlAsH system is widely known to induce artefacts, we still aimed to evaluate if for live cell imaging of IntAct actins. Similar to GFP11, we first determined the integration of tetracysteine-actin into the cytoskeleton with the use of an additional N-terminal FLAG tag and demonstrate that it was properly integrated into the actin cytoskeleton. Unfortunately, after brief incubation with FlAsH-EDT2, we noted 1) a significant amount of background fluorescence, preventing proper actin visualization and 2) that the cell became static indicating toxicity of the FlAsH-EDT2 compound. Titrating down the amount of FlAsH-EDT2 did not alleviate these drawbacks and only resulted in less fluorescence.

      Overall, based on these experiments, we concluded that the T229/A230 position itself is very versatile, as demonstrated by the proper localization of the GFP11-actin variants and the TetraCys-actin. At the same time, none of these tag/fluorophore systems properly visualized actin in living cells. Although we are unsure what the reason is for this, it is easily imaginable that the on/off kinetics of the split GFP system and the FLAG Frankenbodies are suboptimal to allow for the rapid and continuous integration of actin monomers into the F-actin cytoskeleton. We therefore also concluded that currently, the ALFA tag/nanobody system is apparently unique in its ability to visualize epitope tagged actin in living cells (as shown in the manuscript). For simultaneous visualization of multiple isoforms, we rely on progress on the development of novel nanobody-based tags, something we hope the Reviewer will agree is outside the scope of the current work.

      *- The authors make a point of comparing the internally tagged actin to N-terminal tags that are mostly functional but have been shown to affect translational efficiency. I would strongly suggest to include N-terminally tagged actin as control for all assays in this study. Also for the physiological assays (retrograde flow, wound healing), a positive control is missing that shows some effect. Previous studies showed defects with transiently expressed actin with an N-terminal GFP. As retrograde flow measurements are very sensitive to the exact position of the kymographs and wound healing assays is a very crude and indirect readout, such a positive control is essential. *

      We acknowledge that N-terminally tagged actin has been used extensively for actin research (especially before the introduction of Lifeact). For our studies, however, we were specifically interested in whether the internally tagged actins show similar characteristics as compared to wildtype actin. We have not included N-terminally tagged actin in all of our experiments, since this would not affect our conclusions with respect to the functionality of our internally tagged actins. We expect that for future investigations to for example further establish the importance of actin N-terminal modifications in the differential regulation of actin isoforms, the comparison between internally and N-terminally tagged actins could be very instrumental. Yet, we consider this comparison outside the scope of the current manuscript. For now, the results in the manuscript provide evidence that our approach is unique with respect to the fact that it allows isoform-specific tagging without manipulating the N-terminus. As such, our internal tagging system complements the already existing repertoire of actin reporting methods (N-terminal fusion, Lifeact, F-Tractin, actin nanobodies) and allows researchers to study so far unknown properties of actin variants.

      *- Expression of tagged actins in yeast is a very nice idea but it would be far more informative to express the tagged forms as the only copy of actin. This can either be done by directly replacing endogenous actin gene in S. cerevisiae, or (if the tagged versions are not viable) - using the established plasmid shuffle system (express actin on counter-selectable plasmid, then knock out endogenous copy and introduce additional plasmid with tagged actin, then force original plasmid out). In the presence of endogenous S. cerevisiae actin the shown effects are very hard to interpret as nothing is known about relative protein levels (endogenous vs. introduced). Also, if constitutive expression of the ALFA nanobody is harmful for integration into cables, why not perform inducible expression of the nanobody and observe labeling after induction. For the live imaging a robust cable marker is needed, like Abp140-GFP. Finally, indicate the sequence differences between the used actin forms in yeast (supplementary figure with sequence alignment and clear indication of all variations) *

      We thank the reviewer for their positive comments and feedback regarding expression of IntAct variants in yeast. Currently, we have expressed IntAct as an extra copy in the presence of native Act1 of S. cerevisiae. All the IntAct variants have been expressed under a commonly used constitutive TEF1 promoter. We agree with the Reviewer that it would be valuable to attempt to express the tagged forms as the only copy of actin.

      Planned revisions:

      1) As per the Reviewer’s suggestion, we will attempt to make yeast strains with IntAct as the sole expressing actin copy by using the well-established 5-FOA-based plasmid shuffle system in yeast. We will use a ∆act1 strain containing wildtype act1 in a centromeric ura-plasmid described in Harrer et. al, 2007 (generously shared by Prof. Jessica and Prof. Amberg at Upstate Medical University of New York, USA) and express IntAct exogenously via additional plasmids. Shuffling of these strains on 5-FOA will cause the loss of ura-plasmid containing the wildtype act1 copy and will determine whether yeast cells will be able to survive with IntAct as the sole source of actin. If the cells do survive with IntAct as a sole copy, we will perform subsequent analysis for assessing actin cytoskeleton organization under these conditions.

      2) As the reviewer has mentioned, expression of NbALFA during live-cell imaging experiments hindered incorporation of IntAct into linear actin cables in yeast (Suppl. Fig. S13). As per the reviewer’s suggestion, we will now try to create an inducible-expression system for the NbALFA-mNG and observe its effects on incorporation into formin-made actin cables after induction. We have already created NbALFA-mNG constructs under galactose-inducible GALS and GAL1 promoters and are currently constructing yeast strains for these experiments.

      __3) __We will add an extra supplementary Figure to indicate the sequence differences of the various actin variants that we have expressed in yeast.

      - As the authors clearly show good integration of several tagged actins into filaments I would expand the structural characterization: perform alpha fold predictions of actin monomer structures including the various tags to show the expected orientation. It is striking that the only integration site that seems to work well is at the last position of a short helix, indicating that the orientation of the integrated peptide might be fixed in space and be optimal to minimize interference. Also, a docking of the tag onto the recently published cryoEM structures of the actin filament should be shown to indicate where it resides compared to tropomyosin or the major groove where most side binding proteins seem to bind.

      We already performed AlphaFold predictions of the tagged actin monomers, but we have decided to not include these predictions in the manuscript because of two reasons. First and foremost, while the prediction confidence of the non-tagged region is very high (pLDDT > 90), the prediction confidence of the tagged region is very low (pLDDT https://alphafold.ebi.ac.uk/faq), pLDDT values below 70 should be treated with caution and values below 50 should not be interpreted. Intriguingly, the low confidence aligns with the fact that for both tags, the prediction does not match with known features of the tag. The FLAG tag should be a linear/unstructured region in order to be recognized by the antibody and the ALFA tag should organize into an alpha helix (Götzke et al., Nat Comms, 2019). Yet, in the prediction, the FLAG tag partially continues as an alpha helix and the ALFA tag is only a small helix with part of the tag being unstructured. Second, more minor, reason for not including the predictions is that AlphaFold does not predict to what extend the tag is flexible, which means that even if the tagged region is predicted correctly, it is difficult to say whether the regions will interfere with binding of proteins.

      Despite the low prediction confidence, we used the published actin-tropomyosin cryoEM structure (von der Ecken et al., Nature, 2015) to replace WT actin with ALFA tag actin and the results are shown below. Again, although results should be interpreted with caution, the tag does not seem to obstruct monomer-monomer interactions within an F-actin filament and also the tropomyosin binding surface is relatively distant from the tag region, suggesting that these interactions are likely not disturbed by introducing the tag.

      - For any claims regarding usability of tagged variants for isoform research it would be very important to characterize the known posttranslational modifications of tagged actin variants - are the differences between beta and gamma maintained on this level as well?

      Planned revision: Following the Reviewer’s suggestion, we will perform a western blot analysis to compare posttranslational modification (arginylation) of tagged and wildtype actins.

      Technical issues

      - There is no scale for the color coding in Fig. 5A, B

      We deliberately did not add a numerical scale because the images are normalized which means that presenting the actual numbers might be misleading. The numbers could be interpreted as if they actually present the amount of β-actin relative to γ-actin which is not the case due to staining differences and the normalization procedure.

      - The y-scales for Fig. 5C and D need to be identical to allow direct comparison

      Planned revision: We will adapt the scale of Fig. 5D to make it identical to Fig. 5C. Following the other suggestions of the reviewer, we will also critically evaluate our normalization procedure and present those numbers in Fig. 5C-D if the values turn out to be different.

      - Pearson coefficient should not be normalized to a control value as its already a dimensionless parameter. Always report actual R-value - also remove R2 values for Pearson as this makes no sense in this context (not sure if it was a typo or intended).

      We normalized the Pearson coefficient values for visual representation of the results. The majority of the raw coefficient values (more than 80%) are between 0.20 and 0.75 (see raw values in the associated excel file). Theoretically, Pearson coefficient values are possible between 1 (or-1 for negative correlations) and 0. The much smaller window in our values as compared to the theoretical window (0.55 vs 1) led us to normalize the values such that they can be presented on a scale from “maximum expected colocalization” to “minimum expected colocalization”. In this way, the differences between the various tagged actins are much better appreciated in the Figure. As to reporting the R2, the Reviewer is correct. Reporting the R2 is an inadvertent mistake from our side and we will correct it.

      Planned revision: We will change the R2 in the text to PCC or Pearson Correlation Coefficient.

      *- All values on subcellular regions (like stress fiber or cortex) dependet critically on the way thesese regions were thresholded or identified. Provide all details on how this was done in the methods section and ensure that adequate background subtraction and normalization is applied. Optimally, an unbiased (AI or automated) approach based on simple image statistics is used for this to avoid personal bias. *

      Planned revision: As also indicated above, we will add new experiments to better compare the localization of the isoforms in tagged and parental cells. These new experiments will also be accompanied by a more detailed explanation of how the regions were selected and quantified.

      - In Fig. 2A only heterozygous FLAG-actin cells are used. Why not use a homozygous line (for both beta and gamma actin)? The nice band shift of the FLAG version would allow the precise quantification of the fraction of total actin covered by beta and gamma actin, which then could provide some additional info for the apparently weaker beta staining in Fig. 5 (if beta expression is simply weaker). This would be a very simple and useful advantage of the internal tags that could be widely applied.

      In Fig. 2A, we used the heterozygous FLAG-actin cells to directly compare the production of β-actin from the knock-in allele and the wildtype allele in the same cells. The fact that the two bands observed in this western blot analysis (upper and lower) are almost the same (with the FLAG band being a bit more intense) provides the strongest indication that the tag does not interfere with the expression of actin. In Suppl. Fig. 5D, we show that the expression of β-actin is also unaffected in the hemizygous FLAG actin cells, which exclusively express tagged actin.

      Planned revision: As per the Reviewer’s suggestion, we will also add a western blot analysis on the expression of both actin isoforms and total actin in hemizygous cells.

      *- Fig. 3: control with N-terminal tag is missing. Also, why is it not possible to assay filament binding factors like Myosin, Filamin or alpha actinin - instead of co-IP a simple co-sedimentation assay with cell extracts in F-buffer should pick up any major difference in decoration of filaments containing the ALFA tag. Using two speeds for centrifugation it might even be possible to observe effects on filament bundling. The best approach for this would of course be to purify tagged actins and perform in vitro assays but this is clearly beyond the scope of what the authors intended here. I personally think that a broad acceptance of the marker will only come once the biochemistry has been sufficiently characterized so this is a future direction I would strongly encourage. *

      We kindly refer to our response on Page 5/6 for why we have not included the N-terminal control.

      Planned revision: The co-sedimentation assay is an excellent suggestion by the reviewer. Following the Reviewer’s suggestion, we will perform F/G-actin fractionation and assess the presence of several F-actin associated proteins in the F-actin fraction.

      - Fig. 2A has no loading control

      We show this western blot to indicate that the WT actin and tagged actin are expressed at similar levels in the heterozygous knock-in cells. For this, no loading control is needed because we only compare the intensity of the upper band (tagged actin) with the lower band (WT actin).

      - The RPE-1 data are confusing as several constructs show very different localization (completely cytosolic) to HT1080 cells and there is no possible explanation given for this. Maybe simply remove this data set?

      We agree with the reviewer that the differences in the localization between some of the internally tagged actins between the HT1080 and RPE1 cells might be confusing, especially for the A230-A231 variant for example. Yet, the fact that also in these cells, the T229-A230 variant performs equally well as compared to N-terminally tagged actin is an important confirmation that this variant is properly integrated into actin-based structures, independent of cell type. This makes the support for choosing this variant to continue with our studies stronger. A possible explanation for the differences is that RPE1 cells in general tend to form more stress fibers as compared to the HT1080. Since the localization to stress fibers is different between the internally tagged actins, this may explain the differences observed in colocalization.

      __Planned revision: __We will add a short text, in the Results or the Discussion, on the differences between the colocalization values between HT1080 and RPE1 cells.

      *- The angel measurements for lamellipodial actin is not very meaningful: the angel is determined for the radial bundles, which do not correspond to the Arp2/3 angel of single filaments and is likely the results of different nucleation factors, I would suggest to remove this. If angel measurement are really intended, cryoEM needs to be performed. *

      We apologize for this misapprehension from our side which is also noted by the other two reviewers. In the treadmilling videos of the lamellipodia in HT1080 cells, which were obtained using Airyscan super-resolution microscopy, we clearly observe a consistent filament formation at a constant angle, something which we interpreted as the angle between the mother filament and the daughter filament. After consulting the literature, we indeed have to admit that this cannot be interpreted as such and we will remove these datasets.

      Planned revision: We will remove the datasets with the angle measurements (Suppl. Fig. 7A-B) from our manuscript.

      - Replace all SEM with SD values - use at least 3 biological replicates (4D SEM of n=2)

      Planned revision: We will carefully check our statistics and revise where appropriate.

      Minor points

      - Intro: after listing all the details already understood on actin isoforms it is not very convincing to simply state the molecular principles remain largely unclear (l 34) - maybe better "there is no way to study actin dynamics due to current limitations of specific antibodies to fixed samples. Interesting option would be actually to develop nanobodies that are isoform specific.

      We will rephrase the text in the introduction. Regarding the development isoform-specific nanobodies. Although this sounds like a promising way forward, this would likely not result in isoform-specific targeting in living cells. Similar to the antibodies, isoform-specific nanobodies would have to be generated against the N-terminus which, under native conditions, is likely not available due to the occupation with actin-binding protein. Also, since the N-terminus is not structured, it may be extremely challenging to generate nanobodies against these epitopes.

      *- L 71: "involved" in the kinetics is not a good term - maybe affects or regulates.... *

      We will rephrase the text.

      - L148: "suspect" instead of "expect" - this clonal variation is actually a big danger of the employed approach as possible defects in actin organization could be masked by compensatory changes - it would generally be good to show critical data for at least 3 independent clones to rule out dominant selection effects.

      We will rephrase. We agree that clonal variation could be a danger if actin levels are to be investigated. For future follow-up studies, we plan to make additional cell lines to avoid clone-specific conclusions.

      ***Referees cross-commenting** *

      *I completely agree with the comments by reviewer 2 on the various missing controls - adding several or all of those will make the results much more convincing. The key for the adaptation of any new actin probe will be the level of confidence researchers have on the doumented effects. Even some negative effects on actin behavior (I am sure there will be some) should not prevent usage of the strategy as long as there is robust and convincing documentation of those effects. I also agree that including some basic in vitro characterization will go a long way to convince people dierectly working on actin (there is a very high level of biochemical understanding in that field). *

      Planned revision: We will perform the essential controls as suggested by Reviewer 2. Furthermore, for future experiments, we do envisage the production and purification of internally tagged actins and investigate their binding properties in in vitro reconstitution assays. We have already started with optimizing these approaches through our ongoing collaboration (KD, SP).

      Reviewer #1 (Significance (Required)):

      *Significance: Very useful finding that can be applied to any question related to actin-dependent cellular processes (morphogenesis, cell division, cell polarization, cell migration etc.) *

      *Strength: main finding convincing, strong genome edited cell lines *

      *Limitations: application to study of isoforms very limited and data not convincing, statistics and image quantifications need improvement *

      *Advance: identify new location for integral tagging of actin, which was not really possible before. The main relevance is for fundamental cell biology but the approach can also be applied to the study of disease variants in actin. *

      Audience: general cell biology - very broad interest

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      Actin is highly sensitive to modifications, and tagging it with fluorescent proteins or even smaller motifs can affect its function. The most well-known example of this is that fission yeast where actin has been replaced with GFP-actin are inviable (Wu and Pollard, Science 2005) because the labeled actin cannot incorporate into the formin-dependent filaments that make up the cytokinetic ring. Subsequent experiments revealed that formins filter out GFP-actin monomers, as well as monomers that are labeled with smaller fluorescent motifs (Chen et al, J. Structural Biology 2012). Further, attempts to make mammalian cells lines where GFP-beta-actin was knocked into one allele resulted in extreme down-regulation of the GFP-labeled actin, indicating that there is some implicit toxicity with the labeled version. To my knowledge, all attempts at making homozygous GFP-actin knock-ins have been unsuccessful. Therefore, while GFP-actin or other labeled variants can be over-expressed in many different cell types with some success, there is always the question of how faithful the labeled actin represents bona fide actin localization and dynamics.

      To address this van Zwam et al. have developed a clever strategy of screening actin for internal motifs that can tolerate incorporation of a tag without affecting its function. They appear to have found a good candidate, named IntAct, and provide evidence that this tagging position allows the actin to be functional in both human and yeast cells. The work is very promising, and many of the assays performed satisfy the criteria of rigor and reproducibility. Importantly, the authors have created knock-in human cell lines where the tagged actin is expressed at normal levels, including a double allele knock-in that is viable and has normal proliferation and motility. Additionally, the authors show that labeled S. cerevisiae actin can incorporate into actin cables, which are formin dependent. IntAct constructs were shown to interact with several well-known actin binding proteins and localized well to many different actin structures. There was also interesting data obtained from tagging both beta and gamma actin in human cells. However, as an actin scientist eager for new probes to visualize actin in cells, there are still questions about the functionality of these probes. Addressing these issues, listed below, would alleviate the concerns I still have about IntActs after going through the manuscript. IntActs have the potential to have a large impact on cytoskeletal research if it can be rigorously documented that they are functionally as close to unlabeled actin as possible.

      We thank the Reviewer for their constructive comments and general positive evaluation of our study.

      *Reviewer #2 (Significance (Required)): *

      Concerns:

      1. There are no negative controls performed for either the fixed or live-cell imaging of IntAct. Since the fixed cell data is heavily reliant on the presence of flag-labeled puncta at actin filaments, it is important to show that the immunocytochemistry protocol doesn't produce anything that would mimic the localization of actin. For the live cell data, there has been no effort made to show that the binding of the nanobody to the ALFA tag on InAct is specific.

      Planned revision: __We will add the following controls to exclude that any of the labeling procedures produces anything that would mimic the localization of actin: 1) Immunofluorescence staining of the used tags (FLAG/ALFA) in cells that do not have tagged actins 2) Expression of ALFA-Nb-GFP and ALFA-Nb-mScarlet in cells that do not have tagged actins 3)__ Expression of free GFP in cells that have tagged actins. We will co-stain these cells with phalloidin to visualize F-actin and determine if any signal is specifically localized to the actin cytoskeleton.

      2. The homozygous ALFA-tagged IntAct cells have a 50% reduction in the amount of actin expression (Fig. 2D). What is the F:G ratio in these cells? The F:G measurement is only shown for the FLAG-tagged heterozygous IntAct cells, which have the worst co-localization with phalloidin (Fig. 2F) and were not used for subsequent figures. I appreciate that motility and proliferation were measured and shown to not be affected (Fig. 4D,E) , but in our lab reducing the amount of polymerized actin by 50% (which may be more in ALFA-tagged IntAct cells if the F:G changes) has catastrophic effects on other cytoskeletal and organelle systems. Since the homozygous ALFA IntAct cells are the main ones used in the manuscript, they should be the ones that are fully characterized.

      We would like to point out that the reduction is only 20-25 percent depending on the specific western blot analysis and the loading control. Still, the Reviewer is correct about the necessity of the F:G actin measurements of the ALFA-tagged IntAct cells and we therefore included those as Suppl. Fig. 9 in the original manuscript (text on page 9). The quantification of these assays clearly demonstrated that the F-G actin ratio in the ALFA-tagged IntAct cells is the same as in parental cells.

      3. It is not addressed if expressing the ALFA-Nb-GFP construct in ALFA-IntAct cells alter actin properties? This is essential information for live cell imaging experiments.

      Planned revision: We have already performed proliferation and migration experiments in cells that stably express the ALFA-Nb-GFP. These data indicated that proliferation and migration are not affected by the presence of the nanobody and these data will be included in the revised manuscript. To note, in the original manuscript, we already showed that treadmilling of actin at the lamellipodia is not affected by the presence of the ALFA-Nb-GFP.

      4. It is not addressed how much of the ALFA-IntAct gets labeled with ALFA-Nb-GFP and how uniform the labelling.

      We do not understand this specific request of the Reviewer. To our knowledge, it is not possible to assess how much of a probe (in this case the ALFA-Nb-GFP) binds the target (in this case the ALFA-IntAct actins) in living cells. This is not only the case for the ALFA-Nb-GFP but also for any other probe. As an example, when expressing Lifeact, we also do not know how much of the actin molecules within F-actin get labeled with Lifeact and how uniform the labeling is. From the results of the live-cell imaging we can only conclude that the binding is at least so effective that we can readily observe and discern all the actin-based structures that are also observed by Lifeact (see Suppl. Fig. 8 for Lifeact-GFP/ALFA-Nb-mScarlet cotransfection). Whether the regions that do not have F-actin only contain ALFA-Nb-GFP that is bound to actin monomers or also contains a significant fraction of free ALFA-Nb-GFP seems an issue that cannot be addressed.

      5. To assess lamellapodia architecture, "branched actin angle" is measured using AiryScan imaging of actin filaments. This type of microscopy does not offer the ability to image individual actin filaments; what is actually being measured is the orientation of actin bundles to each other. It should be impossible to image the orientation of actin filaments in Arp2/3 dendritic networks and it is surprising that the measurements average to 70 degrees. A suitable substitute for this would be to measure the size and amount of F-actin in phalloidin-stained lamellipodia using kymograph analysis.

      We apologize for this misapprehension from our side which is also noted by the other two reviewers. In the treadmilling videos of the lamellipodia in HT1080 cells, which were obtained using Airyscan super-resolution microscopy, we clearly observe a consistent filament formation at a constant angle, something which we interpreted as the angle between the mother filament and the daughter filament. After consulting the literature, we indeed have to admit that this cannot be interpreted as such and we will remove these datasets.

      Planned revision: We will remove the datasets with the angle measurements (Suppl. Fig. 7A-B) from our manuscript.

      6. Was it possible to make an IntAct gene substitution in yeast?

      Planned revision: We thank the reviewer for this interesting question and as also suggested by Reviewer 1, we are now constructing yeast strains with IntAct as the sole expressing actin copy by using the well-established plasmid shuffle system in yeast. The results of these experiments will determine the ability of IntAct to completely substitute actin function in yeast.

      Also, while this is not necessary for this manuscript, making a fission yeast strain where actin has been substituted with IntAct and demonstrating that IntAct gets incorporated into the cytoplasmic ring and into Cdc12p-polymerized filaments would alleviate MANY potential concerns people would have about these probes by directly assessing situations were other labeled actins have been documented to fail. Along the same lines, it would have been nice to see a comparison in some of the assays of ALFA-IntAct and GFP-actin or another labeled actin variant.

      We appreciate the reviewer for their constructive feedback and completely agree that it is important to document how IntAct behaves in scenarios where other labelled actins have failed. As a proof of principle, IntAct incorporates into both formin- and Arp2/3- made linear and branched actin filaments in yeast (Fig.5E, Suppl. Fig. 14) and this data shows that IntAct labelling strategy is the first to achieve good integration into both these structures as previous efforts with labelled actin such as GFP-Actin fail to incorporate into formin-made actin filaments (Doyle et al., PNAS, 1996). Thus, we believe that IntAct does perform better than other labelled actins in yeast, although, further optimizations are required to overcome limitations regarding incorporation into actin cables in the presence of the ALFA nanobody.

      Planned revision: We have already extended applicability of IntAct to another well-known fungal model system, the fission yeast Schizosaccharomyces pombe (S. pombe). We expressed IntAct variants of human β- and γ- actin, budding yeast actin (Sc-IntAct) and fission yeast actin (Sp-IntAct) from an exogenous plasmid under the native S. pombe actin promoter in an S. pombe strain that constitutively expresses the Nb-ALFA-mNG. Live-cell microscopy of S. pombe cells expressing these proteins revealed that all IntAct variants localize to actin patch-like structures located at the cell poles and cell division site (during cytokinesis). These structures show similar dynamics as reported for actin patches of S. pombe previously (Pelham et al., Nat Cell Biol, 2001). These preliminary results suggest that IntAct proteins show a similar localization pattern to only branched actin networks found in the actin patches of S. pombe like we had previously observed for the budding yeast, S. cerevisiae (Fig. S13 in manuscript). The underlying mechanism for this exclusion from linear actin cable network from both budding and fission yeast remain unknown and may represent an inherent specificity and sensitivity of yeast formins. Our current and future experiments will express IntAct variants in absence of the ALFA nanobody and determine the level of incorporation into actin cables, patches, and actomyosin ring.

      Planned revision: We have also already performed a quantitative analysis to ascertain the effect of Sc-IntAct expression of cortical actin patch dynamics which represent sites of endocytosis in yeast (Young et al., J Cell Biol, 2004; Winter et al., Curr Biol, 1997). We compared actin cortical patch lifetimes between wildtype cells and cells expressing Sc-Act1 or Sc-IntAct as an extra copy. We used Abp1-3xmcherry as a marker for actin patches and quantified the time window between the appearance and disappearance of a patch (actin patch lifetime) from time-lapse microscopy experiments. Our preliminary results indicate that actin patch lifetimes are unaffected by exogenous expression of both Sc-Act1 or Sc-IntAct suggesting that IntAct does not negatively influence or alter actin patch dynamics. These observations suggest its applicability as a direct visualization strategy for actin at the cortical patches in budding yeast alongside existing surrogate markers like Abp1, Arc15, etc (Goode et al., Genetics, 2015; Wirshing et al., J Cell Biol, 2023).

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      *Summary: *

      This paper tackles a new strategy to tag actin in cells, by identifying that incorporation of a tag of moderate size in subdomain 4 of actin minimally affects actin dynamics in cells, and does not perturb its interaction with known partners, as observed in pull-down assays.

      *Major comments: *

      The paper is interesting and experiments are convincing.

      *My main concerns are the following : *

      - Varland et al, is reporting a phosphorylation on Thr229 : I think the authors should mention and discuss this potential PTM that could be affected in IntAct.

      We thank the Reviewer for pointing this out. We are aware of this review that includes phosphorylation on Thr229 as a possible PTM. Yet, this PTM is only reported in one of the Tables of the Review and not further discussed in the text. It is also unclear how the authors determined that Thr229 is a possible phosphorylation site except for the notion that this residue is a threonine and exposed at the surface of the actin molecule. Together with the fact that there is no evidence from primary studies that Thr229 is phosphorylated, we therefore decided to not include it in our discussion.

      - The sequence in subdomain 4 (the alpha helix containing T229A230) is extremely conserved in animals, as well as in between the 6 human actin isoforms. This usually indicates a strong selection pressure on the residues. I think the authors should discuss how surprising it is that the T229A230 position can accomodate various tags while it is probably the place of interaction with other proteins and is playing an important role in the mechanical structural integrity of the actin itself.

      We thank the Reviewer for bringing up this important point. To a certain extent, the conservation argument is true for all of the residues/domains in actin. Any manipulation will change a conserved part of the actin molecule in one way or another and thereby potentially modify its function. This is also evident from the fact that for most of the internally tagged actins, we observed a very poor colocalization with the actin cytoskeleton (Fig. 1). While for the T229/A230, we have not observed any major effects yet, this certainly does not mean that no further changes or defects will be uncovered in future experiments. Nonetheless, since our approach is unique with respect to the fact that it allows isoform-specific tagging without manipulating the N-terminus, our internal tagging system complements the already existing repertoire of actin reporting methods (N-terminal fusion, Lifeact, F-Tractin, actin nanobodies) and allows researchers to study so far unknown properties of actin variants. We have already included in the discussion that, at this point, we can only speculate as to why this variant performs much better than the others (Page 16 of the manuscript) and that possible explanations are the location at the inner domain and the higher structural plasticity of this region as compared to the rest of the molecule, as found during an alanine mutagenesis screen (Rommelaere et al., Structure, 2003).

      - It is now well established that actin plays active and important roles in the nucleus : is ALFA-actin correctly translocated to the nucleus ?

      Planned revision: This is an interesting suggestion. We will perform nuclear-cytosol fractionation experiments and determine whether ALFA-actin is still correctly translocated to the nucleus.

      *- OPTIONAL: one may regret that there is no classical in vitro assays, such as pyrene assays to assess some kinetcis parameters on epitope-tagged actins. I guess this would make the paper a bit too large. Although, it will prove useful to better understand how much formin activity is affected (see below) *

      For further biochemical characterization and a detailed investigation of the precise assembly kinetics of the tagged actins, we (KD, SP) are already working together to set up in vitro reconstitution experiments. Yet, as also indicated by the Reviewer, we consider these experiments outside of the scope of the current work.

      *Minor comments: *

      Below are points that could be addressed by the authors to improve the manuscript readability and highlight some important points that are sometimes missing or are not properly discussed:

      -line 40 "...but the distinct N-terminal epitope is not available under native conditions preventing" is a bit too obscure. Can the authors say clearly what is meant by 'native conditions'?

      In our understanding, the term ‘native’ is generally used when referring to conditions in which proteins are in their natural state, without alterations due to heat or denaturants, and possibly also still interacting with their binding partners. We will rephrase to better indicate that in this specific case, we mean that the region that harbors the N-terminus is usually occupied by actin-binding proteins, preventing the binding of the antibody due to steric hindrance.

      - figure 1A : make a clearer correspondance between the number shown in panel A and the amino acid numbers displayed in panel C and G.

      Planned revision: This is a good point, we will add extra annotation in the graph to better link the panels with each other. We will also add additional annotation in Fig. 1D-F for the same purpose.

      - figure 1A : it could be informative to indicate subdomains in this panel.

      Planned revision: We will add the numbers for the subdomains in Fig. 1A.

      - figure 1C : normalized correlation cell : I am not sure I understand how the normalization of the Pearson coefficient is done. It is therefore not clear how can it >1 or >-1 ? This should be clearly explained in the method section of the paper.

      __Planned revision: __We will better explain the normalization procedure in the Methods section.

      - figure S4 : comes a bit too early when ALFA-actin has not been yet introduced in the main text. Please, reposition this part or provide data with the FLAG-tag version.

      Planned revision: This is a good point and completely overlooked by us. We will introduce this Figure later such that the ALFA tag is already introduced.

      - section starting line 121 : this section should be better motivated = Why are different tags being tested ? This comes later in the discussion, but the reader fails at following the reasoning/motivation here.

      Planned revision: We will add extra motivation for why we added multiple tags.

      - figure 2D, line 145 "We also evaluated actin protein expression in the homozygous ALFA-β-actin cells and this showed that the total amount of β-actin was slightly lower in the ALFA-β-actin cells compared to parental HT1080 cells (Fig. 2C-D)." 'Slightly' is not a very quantitative nor accurate term. please rephrase. Besides, a statistical test for the paired data would also be informative. Besides, data in figure S6B-D indeed show a correlated increase in the expression of Gamma-actin that compensate for the decrease in the Beta-actin level in ALFA-Beta-actin. Can the authors explain why they conclude otherwise?

      Planned revision: This indeed is an important point and we will change the phrasing of this section to provide a more quantitative and accurate description of the western blot quantifications.

      - figure S7B: I am not ure anyone has ever reported measurement of angle of branched actin filament using epifluorescence microscopy. I would remove this panel, or the authors should explain how this measurement can be done objectively.

      We apologize for this misapprehension from our side which is also noted by the other two reviewers. In the treadmilling videos of the lamellipodia in HT1080 cells, which were obtained using Airyscan super-resolution microscopy, we clearly observe a consistent filament formation at a constant angle, something which we interpreted as the angle between the mother filament and the daughter filament. After consulting the literature, we indeed have to admit that this cannot be interpreted as such and we will remove these datasets.

      Planned revision: We will remove the datasets with the angle measurements (Suppl. Fig. 7A-B) from our manuscript.

      *- Figure 2F : can the authors comment on the (significant ?) lower value for FLAG-tag actin ? *

      The lower value for FLAG-tag actin has likely to do with the properties of the antibody and suitability for immunofluorescence. For reason that we do not know, we usually detect more background for the FLAG tag antibody as compared to the other antibodies/ALFA tag nanobody. Since the Pearson correlation coefficient quickly decreases with suboptimal labeling, this is likely the reason that the values for FLAG-actin are lower as compared to the other tagged actins. Importantly, in our biochemistry experiments (F/G-actin), we detect no difference between FLAG-actin and ALFA-actin indicating that it is rather the immunofluorescence and sensitive Pearson correlation analysis than the integration of actin that causes this difference.

      - line 205 "The results from these experiments show that both DIAPH1 and FMNL2 associate with ALFA-β-actin (Fig. 3D),". It is not so obvious that these formins directly interact with monomeric actin via their FH2 domains in co-immunoprecipitation assays. It might very well be mediated by the interaction with profilin, that in turn bind to the FH1 domain of formins. For me, this assay does not make a correct proof that epitope-labelled actin do not interfere with formin activity.

      Planned revision: The point that the co-immunoprecipitation does not demonstrate direct interactions between formins and actin is well taken. We, however, do not claim that this assay proofs that formin activity, or formin-based integration of actin monomers, is similar with tagged actin as compared to wildtype actin. Nonetheless, we will critically re-evaluate the relevant passages and rephrase the text to avoid any confusion.

      - figure 5C&D : both graph should use the same scale for the y-axis for easier comparison.

      Planned revision: We will adapt the scale of Fig. 5D to make it identical to Fig. 5C. Following the other suggestions of the Reviewer (and of Reviewer #1), we will also critically evaluate our normalization procedure and present those numbers in the Figures if the values turn out to be different.

      - figure 5D: I think the way the ratio is performed is misleading. Why not look at the Beta/Gamma ratio using the isoform specific antibodies used in parental cells, and show the results for ALFA-Beta-actin and for ALFA-Gamma-actin separately ?

      We kindly refer to our answer to Reviewer #1 on Page 2 for a detailed explanation on the experimental challenge of comparing the localization of wildtype and tagged actin isoforms.

      Planned revision: We will critically evaluate our normalization procedure and present those numbers in the Figures if the values turn out to be different. Furthermore, we will add a different experimental method to show that the tagged isoforms properly localize to actin-based structures. For this, we will attempt to use micropatterned cells to induce clearly define actin-bases structures and also explore the possibilities of investigating the differential localization in double-tagged cells.

      *- The limitation observed for unbranched cables in yeast that nanobody-tagged ALFA-actin does not incorporate correctly should be discussed and stressed further in the discussion, as it might prove to be a strong limitation for live-cell imaging to reliably study any type of actin networks. *

      We acknowledge the reviewer’s concern regarding the inability of ALFA-tagged actin to incorporate into yeast actin cables when NbALFA is co-expressed and will discuss this point further in the revised manuscript. We have now observed the same limitation for fission yeast actin cables as well and combined, these observations may represent a tighter control and sensitivity of yeast formins towards any perturbations in actin size (since NbALFA binds to ALFA tag with picomolar affinity). To address this issue and as also suggested by Reviewer 1, we are now creating yeast strains with inducible control of NbALFA expression under GALS/GAL1 promoters and observe the labelling of actin structures after this approach. Additionally, expression of variants of NbALFA with high dissociation rates may also allow labelling of actin cables and would be certainly worth a try in the future. A structural comparison between mammalian and yeast formins may be required to shed some light on the molecular basis of this fundamental difference.

      However, since in the absence of the nanobody, this limitation is overcome (Fig. 5E, Suppl. Fig. 14), we believe that with additional modifications and fast developments in imaging technologies, this limitation can be overcome in the future. Thus, IntAct as a labeling strategy represents an advancement over existing labelled actins with the most important aspect being the identification of the T229/A230 residue pair to be permissive for integration of various tags even as large as GFP11 fragment including a linker (26AA) (Reviewer Fig. 2). Importantly, the T229/A230 site is conserved across many organisms (such as Chlamydomonas reinhardatii, Cryptococcus neoformans, etc) and may act as a framework to study the actin cytoskeleton especially in organisms where known surrogate markers like phalloidin and Lifeact may not work or work only sub optimally.

      *Reviewer #3 (Significance (Required)): *

      *General assessment: *

      *This paper provides a new tagging strategy to monitor actin activity in cells, by specifically inserting the tag along the amino acid sequence. *

      *Advance: *

      *This is a very useful tool, as most existing available probes bind to actin in regions that are common to many other actin binding proteins. The authors provide extensive experiments to validate that tagged-actin are functional and do not perturb the actin expression level, actin network architecture nor dynamics. *

      *Audience: *

      *This research paper will be of interest to a rather broad audience (many cell biologists) that are either sutyding actin dynamics or know that actin is involved in the cell functions they study. *

      *Expertise: *

      *My expertise is in vitro actin biochemistry. *

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      This paper tackles a new strategy to tag actin in cells, by identifying that incorporation of a tag of moderate size in subdomain 4 of actin minimally affects actin dynamics in cells, and does not perturb its interaction with known partners, as observed in pull-down assays.

      Major comments:

      The paper is interesting and experiments are convincing.

      My main concerns are the following :

      • Varland et al, is reporting a phosphorylation on Thr229 : I think the authors should mention and discuss this potential PTM that could be affected in IntAct.
      • The sequence in subdomain 4 (the alpha helix containing T229A230) is extremely conserved in animals, as well as in between the 6 human actin isoforms. This usually indicates a strong selection pressure on the residues. I think the authors should discuss how surprising it is that the T229A230 position can accomodate various tags while it is probably the place of interaction with other proteins and is playing an important role in the mechanical structural integrity of the actin itself.
      • It is now well established that actin plays active and important roles in the nucleus : is ALFA-actin correctly translocated to the nucleus ?
      • OPTIONAL: one may regret that there is no classical in vitro assays, such as pyrene assays to assess some kinetcis parameters on epitope-tagged actins. I guess this would make the paper a bit too large. Although, it will prove useful to better understand how much formin activity is affected (see below)

      Minor comments:

      Below are points that could be addressed by the authors to improve the manuscript readability and highlight some important points that are sometimes missing or are not properly discussed :

      • line 40 "...but the distinct N-terminal epitope is not available under native conditions preventing" is a bit too obscure. Can the authors say clearly what is meant by 'native conditions' ?
      • figure 1A : make a clearer correspondance between the number shown in panel A and the amino acid numbers displayed in panel C and G.
      • figure 1A : it could be informative to indicate subdomains in this panel.
      • figure 1C : normalized correlation cell : I am not sure I understand how the normalization of the Pearson coefficient is done. It is therefore not clear how can it >1 or >-1 ? This should be clearly explained in the method section of the paper.
      • figure S4 : comes a bit too early when ALFA-actin has not been yet introduced in the main text. Please, reposition this part or provide data with the FLAG-tag version.
      • section starting line 121 : this section should be better motivated = Why are different tags being tested ? This comes later in the discussion, but the reader fails at following the reasoning/motivation here.
      • figure 2D, line 145 "We also evaluated actin protein expression in the homozygous ALFA-β-actin cells and this showed that the total amount of β-actin was slightly lower in the ALFA-β-actin cells compared to parental HT1080 cells (Fig. 2C-D)." 'Slightly' is not a very quantitative nor accurate term. please rephrase. Besides, a statistical test for the paired data would also be informative. Besides, data in figure S6B-D indeed show a correlated increase in the expression of Gamma-actin that compensate for the decrease in the Beta-actin level in ALFA-Beta-actin. Can the authors explain why they conclude otherwise ?
      • figure S7B: I am not ure anyone has ever reported measurement of angle of branched actin filament using epifluorescence microscopy. I would remove this panel, or the authors should explain how this measurement can be done objectively.
      • Figure 2F : can the authors comment on the (significant ?) lower value for FLAG-tag actin ?
      • line 205 "The results from these experimentsshow that both DIAPH1 and FMNL2 associate with ALFA-β-actin (Fig. 3D),". It is not so obvious that these formins directly interact with monomeric actin via their FH2 domains in co-immunoprecipitation assays. It might very well be mediated by the interaction with profilin, that in turn bind to the FH1 domain of formins. For me, this assay does not make a correct proof that epitope-labelled actin do not interfere with formin activity.
      • figure 5C&D : both graph should use the same scale for the y-axis for easier comparison.
      • figure 5D: I think the way the ratio is performed is misleading. Why not look at the Beta/Gamma ratio using the isoform specific antibodies used in parental cells, and show the results for ALFA-Beta-actin and for ALFA-Gamma-actin separately ?
      • The limitation observed for unbranched cables in yeast that nanobody-tagged ALFA-actin does not incorporate correctly should be discussed and stressed further in the discussion, as it might prove to be a strong limitation for live-cell imaging to reliably study any type of actin networks.

      Significance

      General assessment:

      This paper provides a new tagging strategy to monitor actin activity in cells, by specifically inserting the tag along the amino acid sequence.

      Advance:

      This is a very useful tool, as most existing available probes bind to actin in regions that are common to many other actin binding proteins. The authors provide extensive experiments to validate that tagged-actin are functional and do not perturb the actin expression level, actin network architecture nor dynamics.

      Audience:

      This research paper will be of interest to a rather broad audience (many cell biologists) that are either sutyding actin dynamics or know that actin is involved in the cell functions they study.

      Expertise:

      My expertise is in vitro actin biochemistry.

  3. Aug 2023
    1. The assumption is that the Grand C anyon is a remarkably interesting and beautifulplace and that if it had a certain value P for Cárdenas, the same value P may betransmitted to any number of sightseers—

      Not everyone values the same exact things. Each human being sees, feels, and reacts differently. I think Percy is trying to explain that we as humans value different things based on the relativity that it has to us. For some they value the Grand Canyon because that's what they like and some value insulin because that's what they need.

    2. As Mo unier said, the person is not something one can stud y and p rovid e for; he issomething one struggles for. But unless he also struggles for himself, unless he knowsthat there is a struggle, he is going to be just what the planners think he is.

      I think stereotyping is also another way to think of this. We can't always assume we know someone by how they present themselves because then when we try to know them, our preconceived notions are thrown off. If we don't try to break out of the mold, break out of comfort, then we may always be under control of stereotypes and what others assume of us.

    1. Many students will indeed respond to a scolding by. behaving better, but for others, scolding may be a reward for misbehavior that actually increases it.

      This concept was something I felt I related to personally while reading. I have spent the last 4 years working as a full time paraprofessional/teaching assistant...3 years in first grade and one year in kindergarten. Reading this segment brought me back to my time spent in a classroom, and I almost immediately thought of a particular student who fit this scenario very well. This student would at times act out and of course, wanting to maintain classroom rules, we would correct this behavior. At first glance many would think this was the appropriate thing to do. But as the year went on we discovered the more we responded to these negative behaviors, the more he did them. Psychologically, he knew that if he misbehaved, he would get a reaction from us. He didn't seem to care whether it was a negative or positive one, he was just initiating behaviors he knew would get a reaction, which this portion of text highlights and I enjoyed being able to read something that I felt I could personally connect to.

    2. Some people think that good teachers are born that way, Outstanding teachers sometimes seem to have a magic, a charisma that mere mortals could never hope to achieve, Yet research has begun to identify the specific behaviors and skills that make a “magic” teacher

      I chose to comment on this particular portion of text, because I do not believe there is such a thing as a good or bad teacher. I believe instead of using the term "good teacher" we should instead practice labeling educators as "knowledgable". For example, instead of saying Miss McGuire is a really good teacher, we could say she is a very knowledgable teacher. Everyone has a different definition of what a good teacher is, and by looking at how knowledgable they are on current teaching practices or how knowledgable they are about successful classroom management skills, it separates the idea of being good vs bad simply because they don't teach something the way their observer or peers may.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank both reviewers and the editor for their time and effort in carefully reviewing and comprehending our manuscript. We are grateful for their thorough assessment, as well as the insightful questions and suggestions they have provided. We have taken into account the questions and comments raised by the reviewers, and we have incorporated the necessary revisions accordingly. In the following pages the reviewers’ comments are italicized. Our replies are in normal script.

      In addition to revisions suggested by reviewers we also added a new summary schematic (Fig 8) and minor changes to acknowledgments.

      Reviewer 1

      This is a very strong study with few concerns. Regarding DN1+ T cell function, the authors assessed IFN-γ and activation markers, but it is unclear if the cells are polyfunctional (produced high levels of other cytokines at 6 weeks) or if there were changes in the humoral response (serum Ab titers or size/ number of germinal centers.)

      Thank you for your thorough assessment of our work and your kind comments.

      a. We observed a decreased IFN-γ and TNF-α production in antigen experienced DN1 T cells compared to naïve DN1 T cells, which is consistent with findings in Tfh cells.

      b. We tested for anti-MA IgM and IgG production but did not observe an increase in these antibodies in the vaccinated setting. It is possible that additional inflammatory stimulation, such as from an adjuvant or infection, may be necessary to trigger sufficient antibody level for detection using ELISA.

      c. We did not measure the number or size of germinal centers in this study, but future investigations could explore this aspect.

      Reviewer 2

      1. Authors elaborate the introduction solely highlighting the relevance of antigen persistence in the context of vaccination. However, it is well known that several mycobacterial antigens (Lipids and proteins) can cause detrimental responses when overexposed to the immune system. In this regard, it would be appropriate to introduce the possibility of the occurrence of exhaustion when prolonged exposure to antigens is happening, which is the main theme of this paper.

      Thank you for bringing these points to our attention. We have added a paragraph in the discussion section (page 15-16, line 372-386), addressing the implications of our findings in relation to exhaustion in the context of antigen persistence during chronic viral infections. We have also provided an example involving the lipid trehalose 6,6’-dibehenateled (TDM), a known virulence factor for Mtb, which has been utilized in several subunit vaccines without demonstrating significant toxicity.

      1. Authors need to provide more information about the source of MA. It is briefly mentioned in the materials and methods section that it was obtained from Sigma. If that is the case, it would be ideal to show the integrity of the polysaccharide in term of balance and abundance between different MA species.

      We obtained M. tuberculosis MA from Sigma, which comprises α-, keto-, and methoxy MA forms with an average combined lipid tail length of 80 carbons. MA-specific T cells preferentially recognize these three forms of MA have been identified in humans. We have provided more detailed information regarding the MA in the Materials and Methods section (page 17, line 429-431).

      1. Building up on the previous comment, MA is a complex mixture of polysaccharides including multiple lengths of fatty acids and modifications. Could the authors comments on the potential variability of MA structure and potential impact on immune responses?

      The binding capacities of Group 1 CD1-restricted T cells can be influenced by various factors, including specific head groups, lipid tail length, and structure of the lipid tail. Notably, DN1 T cells have been shown to have higher binding affinities towards keto and methoxy MA, while displaying weaker binding to α-MA (Van Rhijn et al., 2017, Eur. J. Immunol. 47:1525). In our study, we successfully utilized a mixture of MA to activate DN1 T cells, indicating that the required subtypes of MA were present in sufficient quantities to elicit this activation. In future investigations focusing on the polyclonal immune response, incorporating a mixture of MA and possibly other Mtb lipid antigens will enable a broader spectrum of T cell activation. This, in turn, is expected to enhance the overall effectiveness and robustness of protection in challenge experiments.

      1. How do the authors explain the lack of stimulation of cell proliferation induced by MA-PLGA formulation? Does this result contradict previous findings?

      This study represents the first instance of utilizing PLGA as a delivery system for a lipid antigen via a pulmonary vaccine route, despite its previous applications in numerous other vaccine formulations. Therefore, we do not think our findings contradicts any existing research in the field. It is worth noting that the immunogenicity of PLGA can be influenced by the specific polymer chemistry and formulation, which may account for potential variations in the observed effects. We have added additional text to the discussion (page 13, line 310 – 313) to address this point.

      1. Fig 3. Authors switch to IT administration simply arguing against the limitation of IN delivery regarding its low volume. However, administration via IN could be done in an iterative manner. According to this change, this reviewer asks whether the performance of MA-PLGA could now be comparable to BCN-MA using IT instead.

      PLGA possesses an inherent background adjuvant effect, which may not be ideal for precisely stimulating group 1 CD1-restricted T cells, as a considerable proportion of these T cells exhibit some level of autoreactivity (Li, et al, 2011, Blood 118:3870, De Lalla et al., 2011, Eur. J. Immunol. 41:602; de Jong et al, 2010, Nat. Immunol. 11:1102). Notably, our observations revealed that blank PLGA-NP exerted a significant stimulatory effect on both mouse (DN1) and human (M11) MA-specific T cells (Fig. 2A-D). This underscores the advantage of the BCN system, which lacks detectable adjuvant effects and enables a more controlled, dose-dependent augmentation of T cell responses with increasing concentrations of loaded MA. Therefore, we did not further evaluate the impact of PLGA-MA using the IT route of vaccination.

      1. What would be the reasons of the no role of encapsulating NP in the persistence of MA?

      In this study, we have provided evidence to support the notion that encapsulation plays a role in antigen persistence, as demonstrated in Fig. 5A-C. Specifically, we directly compared the persistence of MA when delivered encapsulated in BCNs versus without encapsulation in BCNs, using DC pulsing and IT vaccination as the delivery methods. Our results indicate that at 6 weeks post-vaccination, MA encapsulated in BCNs can activate DN1 T cells, while free MA does not. These findings may initially appear to be contradictory to those depicted in Fig. 5D-F, where antigen persistence is observed following vaccination with attenuated Mtb. However, we propose that the attenuated Mtb bacteria may function similarly to nanoparticles by encapsulating and containing MA, thereby facilitating its persistence within the host. We appreciate the opportunity to clarify these points (page 15, line 364-367). Encapsulation within PEG-PPS NP may also contribute to two additional mechanisms. First, we have demonstrated that PEG-PPS NPs target myeloid cell populations (Burke et al., 2022, Nat. Nano. 17:319), such as alveolar macrophages, that can serve as antigen persistence depots as well as present CD1b/MA complexes on their surfaces. NPs allow more efficient delivery to these cells, whereas otherwise the lipid would bind to albumin, HDL, LDL, and other lipid carriers in blood for a broader, non-specific biodistribution, which would include cells less efficient at antigen persistence or presentation. Second, we previously demonstrated that the BCN nanostructure is highly stable within cells, supporting a slow intracellular release (Bobbala et al., 2020, Nanoscale 12:5332). This could assist with a more sustained presentation of lipid antigen by targeted cells in contrast to free form lipid or NPs (like PLGA) that rapidly degrade within cells. Indeed, low levels of fluorescently tagged BCNs were still detectable 6 weeks post-vaccination (Fig. 6B). Our future studies will further investigate this hypothesis.

      1. Authors need to discuss to what extent the MA location into AM is route dependent.

      The localization of MA within alveolar macrophages (AMs) in the lung is likely specific to intratracheal (IT) vaccination. Therefore, mice vaccinated subcutaneously (SC) or intravenously (IV) may possess distinct antigen persistence depots. We have made modifications to the discussion section to further emphasize this point (page 15, line 359-364).

      1. Also, AM are programmed to sustain low immune responses because of their unique location in the lung. In fact, Mtb uses this to replicate while immune response is mounted. In this regard, accumulation of MA into this compartment may not be relevant for the overall immune response. In other words, what would be the contribution of this population to the T cell activation?

      It is likely that AMs primarily function as antigen depots and do not directly contribute to the activation of DN1 T cells. This assertion is supported by our findings, as co-culturing AMs with DN1 T cells alone did not result in T cell activation (Fig. 6E). However, we observed that the presence of hCD1Tg-expressing bone marrow-derived dendritic cells was necessary for DN1 T cell activation in vitro, which likely reflects a similar phenomenon occurring in vivo.

      1. Could the T cells responses measured be due to the reduced fraction of DC loaded with BCN-MA at initial time points?

      Regarding the T cell response observed in Fig. 5A-C, where we used DCs to deliver either free MA or MA-BCN, we took steps to address potential differences in loading capacity between the two at initial time points. Specifically, DCs were pulsed with a concentration of 10 𝜇g/mL for free MA and 5 𝜇g/mL of MA-BCN (the figure legend has been modified to clarify this point, page 37, line 962 - 963). To ensure approximate equivalence in loading, we examined the immune response one week after vaccination and found no statistically significant difference between the two methods.

    1. Reviewer #1 (Public Review):

      Murphy, Fancy and Skene performed a reanalysis of snRNA-seq data from Alzheimer Disease (AD) patients and healthy controls published previously by Mathys et al. (2019), arriving at the conclusion that many of the transcriptional differences described in the original publication were false positives. This was achieved by revising the strategy for both quality control and differential expression analysis. I believe the authors' intention was to show the results of their reanalysis not as a criticism of the original paper (which can hardly be faulted for their strategy which was state-of-the-art at the time and indeed they took extra measures attempting to ensure the reliability of their results), but primarily to raise awareness and provide recommendations for rigorous analysis of sc/snRNA-seq data for future studies.

      STRENGTHS:

      The authors demonstrate that the choice of data analysis strategy can have a vast impact on the results of a study, which in itself may not be obvious to many researchers.

      The authors apply a pseudobulk-based differential expression analysis strategy (essentially, adding up counts from all cells per individual and comparing those counts with standard RNA-seq differential expression tests), which is (a) in line with latest community recommendations, (b) different from the "default options" in most popular scRNA-seq analysis suites, and (c) explains the vastly different number of DEGs identified by the authors and the original publication. The recommendation of this approach together with a detailed assessment of the DEGs found by both methodologies could be a useful finding for the research community. Unfortunately, it is currently not fully substantiated and is confounded with concurrent changes in QC measures (see weaknesses).

      The authors show a correlation between the number of DEGs and the number of cells assessed, which indicates a methodological shortcoming of the original paper's approach (actually, the authors of the original paper already acknowledged that the lesser number of DEGs for rare cell types was a technical artefact). To be educational for the reader it would be important to provide more information about the DEGs that were "found" and those that were "lost". Given vast inter-individual heterogeneity in humans, it is likely that the study was underpowered to find weaker differences using the pseudobulks (Fig. 1B shows that only genes with more than 4-fold change were found "significant").

      All code and data used in this study are publicly available to the readers.

      WEAKNESSES:

      The authors interpret the fact that they found fewer DEGs with their method than the original paper as a good thing by making the assumption that all genes that were not found were false positives. However, they do not prove this, and it is likely that at least some genes were not found due to a lack of statistical power and not because they were actually "incorrect". The original paper also performed independent validations of some genes that were not found here.

      I am concerned that the only DEGs found by the authors are in the rare cell types, foremost the rare microglia (see Fig. 1f). It is unclear to me how many cells the pseudo-bulk counts were based on for these cells types, but it seems that (a) there were few and (b) there were quite few reads per cells. If both are the case, the pseudobulk counts for these cell populations might be rather noisy and the DEG results are liable to outliers with extreme fold changes.

      The authors claim they improved the quality control of the dataset. While I do not think they did anything wrong per se, the authors offer no objective metric to assess this putative improvement. This is another major weakness of the paper as it confounds the results of the improved (?) differential analysis strategy and dilutes the results. I detail this weakness in the two following points:

      Removing low-quality cells: The authors apply a new QC procedure resulting in the removal of some 20k more cells than in the original publication. They state "we believe the authors' quality control (QC) approach did not capture all of these low quality cells" (l. 26). While all the QC metrics used are very sensible, it is unclear whether they are indeed "better". For instance, removal with a mitochondrial count of <5% seems harsh and might account for a large proportion of additional cells filtered out in comparison to the original analysis. There is no blanket "correct cutoff" for this percentage. For instance, the "classic" Seurat tutorial https://satijalab.org/seurat/articles/pbmc3k_tutorial.html uses the 5% threshold chosen by the authors, an MAD-based selection of cutoff arrived at 8% here https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html, another "best practices" guide choses by default 10% https://bioconductor.org/books/3.17/OSCA.basic/quality-control.html#quality-control-discarded, etc. Generally, the % of mitochondrial reads varies a lot between datasets. As far as I can tell, the original paper did not use a fixed threshold but instead used a clustering approach to identify cells with an "abnormally high" mitochondrial read fraction. That also seems reasonable. Overall, I cannot assess whether the new QC is really more appropriate than the original analysis and the authors do not provide any evidence in favor of their strategy.

      Batch correction: "Dataset integration has become a standard step in single-cell RNA-Seq protocols" (l. 29). While it is true that many authors now choose to perform an integration step as part of their analysis workflow, this is by no means uncontroversial as there is a risk of "over-integration" and loss of true biological differences. Also, there are many different methods for dataset integration out there, which will all have different results. More importantly, the authors go on "we found different cell type proportions to the authors (Fig. 1a) which could be due to accounting for batch effects" but offer no support for the claim that the batch effects are indeed related to the observed differences. An alternative explanation would be a selective loss/gain of certain cell types during quality control. The original paper stated concerns about losing certain cell types (microglia, which do not seem to be differentially abundant in the original paper / new analysis).

      Relevant literature is incompletely cited. Instead of referring to reviews of best practices and benchmarks comparing methods for batch correction and or differential analysis, the authors only refer to their own previous work.

      Due to a lack of comparison with other methods and due to the fact that the author's methodology was only applied to a single dataset, the paper presents merely a case study, which could be useful but falls short of providing a general recommendation for a best practice workflow.

      APPRAISAL:

      The manuscript could help to increase awareness of data analysis choices in the community, but only if the superiority of the methodology was clearly demonstrated. The recommended pseudobulk differential expression approach along with the indication of drastic differences that this might have on the results is the main output of the current manuscript, but it is difficult to assess unequivocally how this influenced the results because the differential analysis comes after QC and cell type annotation, which have also been changed in comparison to the original publication. In my opinion, the purpose of the paper might be better served by focusing on the DE strategy without changing QC and instead detailing where/how DEGs were gained/lost and supporting whether these were false positives.

    1. Reviewer #1 (Public Review):

      Summary: This paper performs fine-mapping of the silkworm mutants bd and its fertile allelic version, bdf, narrowing down the causal intervals to a small interval of a handful of genes. In this region, the gene orthologous to mamo is impaired by a large indel, and its function is later confirmed using expression profiling, RNAi, and CRISPR KO. All these experiments are convincingly showing that mamo is necessary for the suppression of melanic pigmentation in the silkworm larval integument.

      The authors also use in silico and in vitro assays to probe the potential effector genes that mamo may regulate.

      Strengths: The genotype-to-phenotype workflow, combining forward (mapping) and reverse genetics (RNAi and CRISPR loss-of-function assays) linking mamo to pigmentation are extremely convincing.

      Weaknesses:

      1) The last section of the results, entitled "Downstream target gene analysis" is primarily based on in silico genome-wide binding motif predictions.<br /> While the authors identify a potential binding site using EMSA, it is unclear how much this general approach over-predicted potential targets. While I think this work is interesting, its potential caveats are not mentioned. In fact the Discussion section seems to trust the high number of target genes as a reliable result. Specifically, the authors correctly say: "even if there are some transcription factor-binding sites in a gene, the gene is not necessarily regulated by these factors in a specific tissue and period", but then propose a biological explanation that not all binding sites are relevant to expression control. This makes a radical short-cut that predicted binding sites are actual in vivo binding sites. This may not be true, as I'd expect that only a subset of binding motifs predicted by Positional Weight Matrices (PWM) are real in vivo binding sites with a ChIP-seq or Cut-and-Run signal. This is particularly problematic for PWM that feature only 5-nt signature motifs, as inferred here for mamo-S and mamo-L, simply because we can expect many predicted sites by chance.

      2) The last part of the current discussion ("Notably, the industrial melanism event, in a short period of several decades ... a more advanced self-regulation program") is flawed with important logical shortcuts that assign "agency" to the evolutionary process. For instance, this section conveys the idea that phenotypically relevant mutations may not be random. I believe some of this is due to translation issues in English, as I understand that the authors want to express the idea that some parts of the genome are paths of least resistance for evolutionary change (e.g. the regulatory regions of developmental regulators are likely to articulate morphological change). But the language and tone is made worst by the mention that in another system, a mechanism involving photoreception drives adaptive plasticity, making it sound like the authors want to make a Lamarckian argument here (inheritance of acquired characteristics), or a point about orthogenesis (e.g. the idea that the environment may guide non-random mutations).<br /> Because this last part of the current discussion suffers from confused statements on modes and tempo of regulatory evolution and is rather out of topic, I would suggest removing it.

      In any case, it is important to highlight here that while this manuscript is an excellent genotype-to-phenotype study, it has very few comparative insights on the evolutionary process. The finding that mamo is a pattern or pigment regulatory factor is interesting and will deserve many more studies to decipher the full evolutionary study behind this Gene Regulatory Network.

      Minor Comment :

      The gene models presented in Figure 1 are obsolete, as there are more recent annotations of the Bm-mamo gene that feature more complete intron-exon structures, including for the neighboring genes in the bd/bdf intervals. It remains true that the mamo locus encodes two protein isoforms.<br /> An example of the Bm-mamo locus annotation, can be found at : https://www.ncbi.nlm.nih.gov/gene/101738295<br /> RNAseq expression tracks (including from larval epidermis) can be displayed in the embedded genome browser from the link above using the "Configure Tracks" tool.

      Based on these more recent annotations, I would say that most of the work on the two isoforms remains valid, but FigS2, and particularly Fig.S2C, need to be revised.

  4. cqpress-sagepub-com.lmc.idm.oclc.org cqpress-sagepub-com.lmc.idm.oclc.org
    1. Proponents see two main advantages: One is that police, as generalists, are not trained to respond to every type of domestic or mental health crisis. Having others carry part of the load should free officers up to respond when and where they are really needed, such as violent situations, Travis says.Cherelle Parker, a City Council member in Philadelphia, agrees, saying: “We're not asking police officers to become psychiatrists, psychologists and therapists — we can get those who are experts in those areas to address those issues. When mental and behavioral health is needed, we now have another vehicle that we can use.”

      I think it is good we are realizing police can not do everything just like a DR or a nurse does not do everything. yes you can have a general practitioner but there is still tasks and jobs they don't do. I feel this same strategy with police would allow them to specialize in certain cases where it may be needed. or have others who are more proficient complete those tasks

    1. The social media landscape continues to evolve dramatically, with new social networks like TikTok entering the field as well as existing platforms like Instagram and Telegram gaining markedly in popularity among young audiences. As social natives shift their attention away from Facebook (or in many cases never really start using it), more visually focused platforms such as Instagram, TikTok, and YouTube have become increasingly popular for news among this group. Use of TikTok for news has increased fivefold among 18–24s across all markets over just three years, from 3% in 2020 to 15% in 2022, while YouTube is increasingly popular among young people in Eastern Europe, Asia-Pacific, and Latin America.

      I remember when YouTube had to do something about fake news after the big event on January 6th. They made rules to take down videos with false information. That's good because we want to know the right stuff. Also, TikTok gives users content creation freedom and more freedom of speech, and we can make our own videos and say what we think. But sometimes, that can also be a problem. Since we can post anything, some things might not be true. Like, gossip about famous people or even important things like politics. TikTok does not always check if things are real before they spread however, I do sometimes see warning on the video if the video may cause bodily harm if tried to perform at home.

    2. Here, we aim to unpack these new behaviours as well as to dismantle some broad narratives of ‘young people’. Instead, we consider how social natives (18–24s) – who largely grew up in the world of the social, participatory web – differ meaningfully from digital natives (25–34s) – who largely grew up in the information age but before the rise of social networks – when it comes to news access, formats, and attitudes.2 These groups are critical audiences for publishers and journalists around the world, and for the sustainability of the news, but are increasingly hard to reach and may require different strategies to engage them.

      This part of the article really caught my attention. I'm 28, I remember a time when social media was almost nonexistent. I think it was really interesting watching social media platform become a staple. I can definitely relate to the sentence, "... (digital natives) grew up in the information age...", because that how I viewed the online word. If I wanted to learn about something new or keep up with topics, I had to search and dig through many websites to find one that I, not only liked but also, trusted.

    3. Here, we aim to unpack these new behaviours as well as to dismantle some broad narratives of ‘young people’. Instead, we consider how social natives (18–24s) – who largely grew up in the world of the social, participatory web – differ meaningfully from digital natives (25–34s) – who largely grew up in the information age but before the rise of social networks – when it comes to news access, formats, and attitudes.2

      Social natives may be more interactive in social media and more inclined to socialize on their devices instead of in person. This is my hypothesis because I see a lot of younger people Facetiming and utilizing social media more than I do, and even though I could be as active as them, I am just not as inclined to participate on a more personal account. I would be more inclined to participate promoting a company I work for on social media than interacting with it as much as the younger generation for myself. I think that digital natives may be more skilled at media literacy because they have some background knowledge on what existed before the floods of mass misinformation from social media on platforms like Twitter and Facebook- while seeing how misinformation was perpetuated in the media for the generation older than them.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their time, the positive reviews and the useful comments. We answer below and explain the changes made to the manuscript. The comments of the reviewers are in italics.

      Reviewer #1

      1. 'For GWAS, the strains that were fertile after 20 generations were considered non-Mrt.' One aspect of Fig 1D that could be clarified are the dots at generation 21. If these represent strains that were always fertile at generation 21, then perhaps give these a different color to indicate that sterility was never observed?

      Response: This is a good idea. We added colors in Figure 1, which makes it clearer.

      We also provide a different color for surviving replicates in all relevant figures.

      1. 'The mean Mrt values of strains ranged from sterile at 3 generations to fertile after 20 generations at 25°C, with a skewed distribution toward high values (Figure 1B).' Based on Table S2, part of the explanation for this skewed distribution in later generations is that some strains became sterile rapidly for some blocks, whereas the same strain did not become sterile in other blocks. For example, JU1200, JU360, PB303. I suggest providing a second color for Fig. 1D for strains that sometimes displayed sterility and sometimes did not.

      __Response: __We now colored the isolates that never became sterile, with the same color code as in panel B. Because we stopped the scoring at G20 and code fertility at G20 as '21', those with a mean below 21 show some sterility in at least one case.

      Because the number of generations at which we stopped the phenotyping (20) is arbitrary, the fact a line stayed fertile at 20 generations in one replicate is not very meaningful, especially considering that the number of replicates is not the same for all strains. The key point of the variance graph is to show that the strains with the most variance are those with high but

      For those that were sometimes fertile and sometimes sterile, I suggest creating a graph in Figure 1 that shows generations at sterility or lack of sterility, color coded by block. This will allow the significance of strains with high generation Mrt values to be better appreciated for readers who do not look at the supplementary table.

      __Response: __Yes, we added this graph in Figure S1. This is indeed useful.

      1. The GWAS section could benefit from a simple explanation of the premise of GWAS for non-specialist readers.

      __Response: __Yes, we added: "A genome-wide association study (GWAS) is a genetic mapping that uses the natural diversity of a panel of organisms of a given species to test for statistical independence between the allelic state of polymorphic markers and the phenotype of interest (Andersen and Rockman 2022). A statistical association between the marker and the phenotype indicates that a polymorphism tightly linked to the marker in the data (i.e. in linkage disequilibrium with it) causes the variation in phenotype. For statistical reasons, GWAS can only detect polymorphisms that are at intermediate frequencies in the panel, i.e. cases where both alleles occur at frequencies higher than 5%. We only used such polymorphisms in the GWAS (see Methods)."

      And further down:

      "To diminish the multiple testing burden, the initial analysis in Figure 1E used a restricted set of markers, after pruning those that were in high linkage to each other."

      1. One problem might be that the Mrt phenotype is widespread among wild strains. To the authors' credit, they consider results observed in different laboratories as valid, even when the results do not agree. If the Mrt phenotype is influenced by the environment, then some laboratory environments might result in 'false negative' Mrt results that could be ignored in favor of positive results from another lab that appear strong. Might focusing on strains with a set of strong positive results from one lab allow the authors to draw stronger GWAS conclusions?

      2. The authors' perform GWAS based on the variance of the Mrt phenotype data. Would the GWAS data be more illuminating if the authors only considered strains that become sterile fairly rapidly, within 10 generations. The authors might then have a second category that included strains that become sterile from generation 11-20. If the genetic basis for the Mrt phenotypes is the same, then GWAS of strains that become sterile in less than 10 generations might yield similar peaks as GWAS for strains that become sterile between generations 11-20.

      __Response: __These two comments are strongly related so we answer them together. Note that the GWAS is not mapping the variance values but the Mrt values themselves.

      We actually initially only used block 1 (a single replicate, all strains performed in parallel in our laboratory) and also detected the chromosome III association using a categorical variable (threshold at 11), but decided to show the results with all data to maximize power, taking into account the generation value and block effects.

      We investigated other ways to code the data (e.g. categorically) and removing the strains of the most variable middle category, as proposed by the reviewer. This changed the p values and the rank of the markers on chromosome III but not the overall result.

      In summary, we did a variety of tests, which pointed to chromosome III, a region that was validated using crosses (Figure 2).

      Note that in the revision, we updated the GWAS plot and fine mapping table as we noticed a few problems in our previous mapping. 1) We removed 3 isolates that were classified in Lee et al. 2021 as divergent. 2) We included strains that had been lost in the pipeline because their names did not match CeNDR isotypes. This increased the significance of the chromosome III peak.

      __Response: __There was no comment 6.

      1. 'We did not investigate whether a second locus present in JU775 on the right arm of Chr III might have a lesser effect.'

      __Response: __We are not sure what the reviewer meant. Considering the difficulties with the stronger effect locus, we did not try to study loci with a weaker effect.

      1. It might be interesting to test the memory of growth on beneficial bacteria on JU4134, which had a Mrt phenotype that was strongly suppressed by the beneficial bacteria.

      __Response: __We agree that testing other strains would be useful but given the duration of such experiments (30 generations and two weeks of preparation before), we respectfully decline to perform this experiment that does not seem strictly necessary.

      1. The Mrt phenotype of mutants in small RNA inheritance and histone modifying enzymes 'appears however distinct from that of the prg-1/piwi mutant (for which the cause of sterility is debated), especially the latter does not show temperature dependence and is suppressed by starvation.' While it is true that the cause of sterility is debated for the prg-1/piwi mutant, this mutant is defective for small RNA silencing and likely has parallels with some defects in histone modifying enzymes. Anecdotal reports suggest that starvation might affect the Mrt phenotype or longevity of histone modifying enzyme mutants. Moreover, the cause of sterility is not clear for small RNA inheritance and histone modifying enzyme mutants. It is fair to say that the distinction between temperature-sensitivity or lack of temperature sensitivity of small RNA mutants is not understood. Could the authors please comment here about whether any of the wild strains display sterility at 20°C.

      __Response: __The temperature-dependence of the wild isolates is progressive between 20-25°C. We previously showed that strains with a very strong Mrt phenotype, such as QX1211, can display sterility at 20°C (Figure 1B in Frézal et al. 2018). However, its Mrt phenotype is still temperature-dependent as the sterility occurs much earlier at 25°C.

      1. If intracellular bacteria are simply somatic, then how is it that they are transmitted to progeny. If they are released into the environment and then consumed by hatched larvae, this is soma-to-soma transmission.

      __Response: __These microsporidia (which are eukaryotes related to fungi) are indeed transmitted horizontally. To make this clear, we added: "colonizing its intestinal cells and being transmitted horizontally via defecation and ingestion of spores". The soma-to-germline interaction concerns the effect of microsporidia on germline maintenance.

      Minor: 1. 'We measured the mortal germline (Mrt) phenotype'. Mortal Germline (Mrt)

      __Response: __It is unclear as to whether phenotypes start with a capital letter when they are in full words. We did write phenotypes in previous works with a capital letter but have changed because C. elegans nomenclature rules (https://cgc.umn.edu/nomenclature) suggest that they should not: "Phenotypic characteristics can be described in words, e.g., dumpy animals or uncoordinated animals." For the mortal germline phenotype in particular, we find several ways to write it in articles (with 0, 1 or 2 capital letters, including the three reviewers). We are happy to change it if required.

      Reviewer #2

      Major comments: The authors claimed that the variants causing Mrt exist at intermediate frequency in the natural population but the evidence supporting this claim is rather limited.

      __Response: __Thank you for this comment as it helped us clarify the manuscript.

      To better explain the notion of intermediate frequency in the GWAS, we added an explanation of the principle of the GWAS (see above) and again in the Discussion: "The intermediate frequency of the candidate alleles derives from the GWAS approach, which cannot detect rare alleles, such as set-24, that are present in a single strain of the dataset."

      We also illustrated the frequency by adding a plot (Fig. 1F) showing the association of the most associated candidate SNP, with a visual depiction of the frequency. We further added in Results: "For SNPs with a high significance (p-4) in the fine mapping, the frequency of the Mrt associated allele was comprised between 21 and 41% in our GWAS strain set (Table S3); as an example, the Mrt allele of the associated SNP shown in Figure 1F (III:4677491) displayed a frequency of 29% in the restricted strain set. Over the global wild strain set with genotypes at CeNDR in 2020, these numbers are 17-58% and 39%, respectively. "

      To strengthen the claim, the authors should examine the distribution and frequency (perhaps coupled with phylogenetic analysis) of the Ch III haplotype in the wild isolates. The authors should also examine the GWAS peak for the signature of balancing selection (e.g., dN/dS ratio).

      __Response: __Thank you for this comment. The different associated SNPs in Table S3 differ in their allele frequency (Table S3), hence they belong to different haplotypes. We added a supplementary Figure S2 with an analysis of the haplotype structure. Those at a low frequency (around 20%) belong to the same haplotype (e.g. JU775 and MY10) but some associated alleles are present in more haplotypes (40-50%), such as JU1793. Even if we neglect recombination, the history of mutations in the region is complex and there is not a single associated haplotype. We now show the genotypes of these different haplotypes at all SNPs in Table S3. We also added Table S4 that shows the co-occurrence of relevant haplotypes in local populations.

      Concerning tests of balancing selection, without knowing the causal polymorphism and linked haplotype, this is far reaching. We only feel confident to say that the causal polymorphism(s) is present at a significant frequency. We added however the fact that irrespective of which polymorphisms are causal, both alleles were found to coexist locally.

      Results: relevant text was added at the end of the GWAS section.

      Discussion: "The co-occurrence of relevant chromosome III haplotypes on multiple continents and in local populations (Table S4) is suggestive of balancing selection; however, a linked locus other than that causing the Mrt phenotype may be involved."

      Does JU775 carry polymorphisms in genes that are known to be involved in Mrt? These genes may genetically interact with the Ch III variant, as suggested by the partial penetrant phenotypes of the introgressed lines. It would be helpful to have a table summarize the variation in these genes.

      __Response: __It is difficult to deduce much from a genomic variant analysis, so we refrain from showing tables of polymorphisms beyond that used for the fine GWAS mapping in Table S3. For example, a non-synonymous SNP may or may not alter protein activity and cis-regulatory elements are difficult to assess. Moreover, an obviously null allele may be compensated by another polymorphism in the background. The JU775 alleles and bam files are publically available from CeNDR (Erik Andersen's lab): https://caendr.org/data/data-release/c-elegans/latest

      It is curious to me that for experiments with HT115, the expression of the RNAi vectors was induced with IPTG. Is this step necessary? It is known that even the backbone of L4440 could trigger a non-specific RNAi response (PMID: 30838421). I wonder if activating exogenous RNAi response is required for Mrt rescue.

      __Response: __Indeed: this experiment was initially aimed at testing RNAi sensitivity of JU775, thus IPTG was added on the plate (Figure 7, panel B). We therefore repeated the memory experiment with OP50 and without IPTG, with a similar result (Figure 7, panel A).

      In figure 7, it appears that the worms transferred from MG1655/HT115 to OP50 showed an even stronger rescue (higher Mrt value) than the ones constantly on MG1655/HT115. This suggests to me that fluctuations in food composition may strongly affect epigenetic inheritance. Please clarify as this is very interesting, if true.

      __Response: __Note: This answers the comment above (IPTG is not required).

      We indeed noticed this strong rescue but do not wish to make a point as we did no attempt to reproduce this result in the exact same conditions. The experiment in panel B does not show this effect.

      Optional - Numerous studies have shown that SKN-1 regulates metabolism in response to food composition and availability (PMID: 23040073). Additionally, some recent studies have indicated a role of SKN-1 in epigenetic inheritance triggered by exogenous RNAi. In particular, SKN-1 promotes stress-induced epigenetic resetting (PMID: 33729152). I wonder if SKN-1 modulates Mrt based on bacterial diet.

      __Response: __We tested skn-1b/c hypomorphic and gain-of-function mutants in the N2 background on E. coli OP50 and did not see an effect of the skn-1 allele.

      Minor comments Line 47: typo "...they defined..."

      __Response: __We did mean "thus defined".

      Line 100-101: weird sentence structure. Please consider rephrasing.

      __Response: __We simplified to "a wild C. elegans strain can keep the memory of its culture on a suppressing bacterial strain."

      Line 138-139: I don't quite understand what "intermediate-frequency chromosome III alleles" means here. Some SNPs were found in Ch III 4-6Mb? Please expand.

      __Response: __We rephrased to: "because this isolate carries the chromosome III alleles associated in the GWAS analysis with the Mrt phenotype (Table S3)."

      Line 213 - it was unclear to me why the assay was performed at 23C instead of 25C. I later learned in the method section that microsporidia cannot be cultured at 25C. I think it will be helpful to add that information when microsporidia is introduced to improve clarity.

      __Response: __We added: " We used a temperature of 23°C because these microsporidia kill C. elegans too rapidly at 25°C."

      Reviewer #3.

      Minor points 1. Could the authors please define "experimental blocks"

      __Response: __We added the following sentence in Results: "Each Mrt assay started at a certain date constitutes an experimental block."

      1. Legend to supplementary snp table should be completed: define AF, impact, modifier, moderate, AA1, AA2...

      __Response: __This is added in the first sheet of the table. We also simplified the table and removed some of these columns.

      1. Please define "intermediate-frequency allele"

      __Response: __We added in Results: "GWAS can only detect polymorphisms that are at intermediate frequencies in the panel, i.e. cases where both alleles occur at frequencies higher than 5%." We also added below: " "For SNPs with a high significance (p-4) in the fine mapping, the frequency of the Mrt associated allele was comprised between 21 and 41% in our GWAS strain set (Table S3); as an example, the Mrt allele of the associated SNP shown in Figure 1F (III:4677491) displayed a frequency of 29% in the restricted strain set."

      1. Figure 7 legend: Authors should be more specific in describing the figure: After 10 (A panel), 13 or 20 generations (B panel) on the K-12 strain... What is E. coli OP50 start 'G10'? the 15° stock?

      __Response: __We changed to: " After 10 (A panel), 13 or 20 generations (B panel) on the K-12 strain" and added some details in:

      "A control from a 15°C culture maintained without starvation ("15°C stock") was bleached in parallel (labeled "E. coli OP50 start "G10" " in the graph of panel A)."

      Optional: Did the authors attempt to rescue the Mrt phenotype with individual metabolites (eg Vit B12...)? These are not straight forward experiments and most likely part of a future study.

      __Response: __We indeed tested several metabolites that are known to differ in C. elegans raised on E. coli OP50 versus K-12 strains for their effect on the Mrt phenotype. None was able to rescue the mortal germline phenotype. However, especially in these long multigenerational experiments, it is difficult to know whether the metabolites are stable. We monitored vitamin B12 activity by using an acdh-1::GFP reporter that is known to be repressed by vitamin B12 - so we are confident of this negative result, which we now show in Figure S4. As cell wall lipopolysaccharide (LPS) differ between E. coli K-12 and B strains, we also tested the E. coli LPS mutants, which had no eff

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The cerebral cortex, or surface of the brain, is where humans do most of their conscious thinking. In humans, the grooves (sulci) and bumps (convolutions) have a particular pattern in a region of the frontal lobe called Broca's area, which is important for language. Specialists study features imprinted on the internal surfaces of braincases in early hominins by casting their interiors, which produces so-called endocasts. A major question about hominin brain evolution concerns when, where, and in which fossils a humanlike Broca's area first emerged, the answer to which may have implications for the emergence of language. The researchers used advanced imaging technology to study the endocast of a hominin (KNM-ER 3732) that lived about 1.9 million years ago (Ma) in Kenya to test a recently published hypothesis that Broca's remained primitive (apelike) prior to around 1.5 Ma. The results are consistent with the hypothesis and raise new questions about whether endocasts can be used to identify the genus and/or species of fossils.

      We would like to thank Rev. 1 for their comments on our paper.

      Reviewer #2 (Public Review):

      The authors tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap (the region in fossil endocasts corresponding to Broca's area in the brain), being more similar to the condition in chimpanzees than in humans. The evidence from the described individual points to this direction but there are some flaws in the argumentation.

      We are grateful to Rev. 2 for their comments, although we partially agree with some of them.

      First, we would like to rectify the statement of Rev. 2 that we “tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap”, indeed, our aim was to test this hypothesis and not to try to validate it.

      First, only one human and one chimpanzee were used for comparison, although we know that patterns of brain convolutions (and in addition how they leave imprints in the endocranial bones) are very variable.

      We understand the point raised by Rev. 2 about the variation of brain convolutions in humans and chimpanzees. We used atlases published by Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022) to analyse the endocast of KNM-ER 3732 and compare it to the extant human and chimpanzee cerebral conditions. However, in Figure 2, for the sake of clarity only two Homo and Pan specimens were used to illustrate the comparison (as it has been done in other published papers, e.g., Carlson et al., 2011; Science, Gunz et al., 2020 Sci Adv). In the revised version, we modified the manuscript to explain further our approach (line 156) “We used brain and endocast atlases published in Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022; see also www.endomap.org) for comparing the pattern identified in KNM-ER 3732 to those described in extant humans and chimpanzees. To the best of our knowledge, these atlases are the most extensive atlases of extant human and chimpanzee brains/endocasts available to date and are widely used in the literature to explore variability in sulcal patterns. In Figure 2, the extant human and chimpanzee conditions are illustrated by one extant human (adult female) and one extant chimpanzee (adult female) specimens from the Pretoria Bone Collection at the University of Pretoria (South Africa) and in the Royal Museum for Central Africa in Tervuren (Belgium), respectively (Beaudet et al., 2018).”.

      Second, the evidence from this fossil specimen adds to the evidence of previously describe individuals but still not yet fully prove the hypothesis.

      We tempered our discussion by concluding that (line 116) “Overall, the present study not only demonstrates that Ponce de León et al.’s (2021) hypothesis of a primitive brain of early Homo cannot be rejected, but also adds information […]”.

      Third, there is a vicious circle in using primitive and derived features to define a fossil species and then using (the same or different) features to argue that one feature is primitive or derived in a given species. In this case, we expect members of early Homo to be derived compared to their predecessors of the genus Australopithecus and that's why it seems intriguing and/or surprising to argue that early Homo has primitive features. However, we should expect that there is some kind of continuum or mosaic in a time in which a genus "evolves into" another genus. This discussion requires far more discussions about the concepts we use, maybe less discussion about what is different between the two groups but more discussion about the evolutionary processes behind them.

      We fully agree with Rev. 2 on this aspect. We believe that identifying these differences/similarities between fossil and extant hominids constitute the first step of a better understanding of the evolutionary mechanisms. Our work suggests indeed a certain continuity between genera and raises questions on the genus concept and how to interpret the specimens currently attributed to early Homo. In the revised version of the manuscript we included a reference to this possible scenario (line 134): “[…] or to the absence of a definite threshold between the two genera based on the morphoarchitecture of their endocasts (Wood and Collard, 1999).”.

      Fourth, the data of convolutional imprints presented are rather subjective when identifying which impressions represent which brain convolutions. Not seeing an impression does not necessarily mean that the corresponding brain feature did not exist. Interestingly, the manuscript does not mention and discuss at all the frontoorbital sulcus. This is a sulcus that usually runs from the orbital surface of the frontal lobe up to divide the inferior frontal gyrus in chimpanzees, a condition totally different than in humans who do not have a frontoorbital sulcus. Could such a sulcus be identified, this would provide a far more convincing argument for a primitive condition in this specimen. In Australopithecus sediba, e.g., the condition in this region seems to be a mosaic in which some aspects of the morphology seem to be more modern while one of the sulcual impressions can well be interpreted as a short frontoorbital sulcus. For this specimen, by the way, I would come back to my third point above: some experts in the field might argue that this specimen could belong to Homo rather than Australopithecus...

      We agree that the presence of a fronto-orbital sulcus would be more conclusive. However, this sulcus has not been identified in KNM-ER3732 and the region in which we would expect to find it is not preserved. As demonstrated by Ponce de León et al. (2021), because of the topographic relationships between sulci (and cranial structures), it is possible to interpret imprints on endocasts and the evolutionary polarity of some traits even in the absence of landmarks such as the fronto-orbital sulcus. In Australopithecus sediba the main derived feature of the endocast corresponds to the ventrolateral bulge in the left inferior frontal gyrus, and not to the sulcal pattern itself (Carlson et al., 2011 Science). However, the discussion around the taxonomic status of this taxon confirms the urgent need for reconsidering specimens from that time period and clarifying the mosaic-like or concerted evolution of the derived Homo-like traits within our lineage. Regarding the subjective nature of this approach, we invite readers to examine the specimen on MorphoSource (https://www.morphosource.org/concern/media/000497752?locale=en) and to request access to the National Museums of Kenya to the physical or virtual specimen to falsify our hypothesis.

      According to my arguments above, I think that this manuscript might revive interesting discussions about this topic but it is not likely to settle them because the data presented are not strong enough to fully support the hypothesis.

      We would be more than happy to consider new/other specimens with similar chronological and geographical contexts and investigate further this hypothesis in the future.

      Reviewer #3 (Public Review):

      The authors provide a detailed analysis of the sulcal and sutural imprints preserved on the natural endocast and associated cranial vault fragments of the KNM-ER3732 early Homo specimen. The analyses indicate a primitive ape-like organization of this specimen's frontal cortex. Given the geological age of around 1.9 million years, this is the earliest well-documented evidence of a primitive brain organization in African Homo.

      In the discussion, the authors re-assess one of the central questions regarding the evolution of early Homo: was there species diversity, and if yes, how can we ascertain it? The specimen KNM-ER1470 has assumed a central role in this debate because it purportedly shows a more advanced organization of the frontal cortex compared to other largely coeval specimens (Falk, 1983). However, as outlined in Ponce de León et al. 2021 (Supplementary Materials), the imprints on the ER1470 endocranium are unlikely to represent sulcal structures and are more likely to reflect taphonomic fracturing and distortion. Dean Falk, the author of the 1983 study, basically shares this view (personal communication). Overall, I agree with the authors that the hypothesis to be tested is the following: did early Homo populations with primitive versus derived frontal lobe organizations coexist in Africa, and did they represent distinct species?

      I greatly appreciate that the authors make available the 3D surface data of this interesting endocast.

      We are grateful to Rev. 3 for their comments and for contextualizing our finding. We would also like to point out that, although the 3D surface can be viewed on MorphoSource, permission from the National Museums of Kenya has to be requested for studying the specimen and getting access to the physical specimen and/or the 3D model.

      Reviewer #1 (Recommendations For The Authors):

      Holloway, Broadfield & Yuan (2004) estimate ER 3732 as having a cranial capacity of 750 cc, which is larger than chimps and australopiths and similar to ER 1470 (752 cc, same reference). (That for Dmanisi 2282 is somewhat smaller at around 650 cc.) Cranial capacities should be mentioned along with added discussion about possible allometric scaling of (increased) numbers of sulci with increasing brain size as well as possible shifts in locations of sulci relative to cranial sutures in larger-brained (including due to ontogenetic maturation) in individuals/species. Could these variables (especially brain size) be relevant for your discussion/conclusions?

      We thank Rev. 1 for their suggestion. We included the estimate by Holloway et al. (2004) (line 95): “Holloway et al. (2004) estimated the endocranial volume as about 750-800 cc but insisted on the low reliability of their estimate.”. Additionally, we raised the possibility of potential allometric effect (line 149): “In parallel, the possibility of allometric scaling and influence of brain size on sulcal patterns in early Homo has to be further explored.” for future discussion.

      From the two figures, it appears that the authors produced a virtual endocast from the cranial remains of ER 3732 and compared its features with those seen on a virtual reproduction of the corresponding natural endocast. If so, this needs to be clarified in the text, not just the figures.

      We thank Rev. 1 for their suggestions that were integrated.

      Reviewer #3 (Recommendations For The Authors):

      While the sulcal imprints on the left hemisphere can be interpreted unambiguously, the anatomical assignment of those on the right side may need to be reconsidered, as they are more ambiguous. For example, the postcentral sulcus (pt) almost touches the middle frontal sulcus, which is an unlikely natural configuration.

      We agree that the configuration on the right hemisphere is intriguing, especially when compared to the extant human and chimpanzee atlases. As such, we decided to change the label for what we think could be the inferior frontal sulcus and leave a question mark instead.

      I encourage the authors to include:

      • a posterior view in Figure 1, and mark the lambdoid suture, parts of which seem to be preserved especially on the left side. This will help the readership to better understand which parts of the endocranial morphology are preserved.

      • a scale bar would be of great utility to appreciate the small size of this specimen. The distance from bregma to the Broca cap seems to be short, indicating an endocranial volume much smaller than the published estimate of 750 ccm. Perhaps the authors can provide a new estimate, which would provide further support for the arguments proposed in the discussion section, especially the question of any presence of Australopithecus at Koobi Fora.

      We included a posterior view of the specimen in Figure 1 and scale bar and modified the legend accordingly. Unfortunately, we were not able to identify with certainty the feature that could correspond to the lambdoid suture. We might see the impression where the parietal bone meets the occipital bone, but there is a risk of misidentification (which is an issue frequently raised in the literature, see for example Gunz et al. 2020 Sci Adv). Concerning the endocranial volume, in the revised version of the manuscript we included the estimate by Holloway et al. (2004). Because the specimen only preserves the superior part, we are reluctant in providing an estimate of the total volume. However, we agree that this would be an interesting feature to integrate in the interpretation of this specimen.

      Minor points

      • This sentence needs to be clarified: «The superior temporal sulcus nearly intersects the lateral fissure on the right hemisphere».

      • The terms «Broca's region» and «orbital cap» need some more context. Do the authors mean «Broca's cap» in either instance?

      We clarified/modified when needed, thank you very much.

      We included minor corrections in addition to those recommended by the reviewers:

      -Lines 50, 74, 142, 149: “Broca’s area” instead of “Broca’s cap”

      -Line 73: “in the pre-1.5 Ma Homo specimen” instead of “in pre-1.5 Ma Homo specimen”

      -Line 100: we specified “in human brains and endocasts”

      -Line 120: “sulcal pattern” instead of “sulcal patterns”

      -Line 144: “behaviors” (plural)

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Major comments:

      1. A control group of mice fed chow diet is needed to distinguish the effects of the genotype from those caused by diet. What is the phenotype of regular chow-fed mice in terms of energy metabolism and thermogenesis?

      We are sincerely grateful to Reviewer 1 for raising an important question regarding the need for a control group of mice fed chow diet.

      To address this concern, we have conducted experiments on mice fed a regular chow diet and measured their phenotype in terms of energy metabolism and thermogenesis. In addition to be sure that the phenotype also is present in when we compared littermates we have included as control both to chow-fed CD4-Cre and littermates (MKK3/6f/f). Our findings reveal that MKK3/6CD4-KO mice fed a chow diet presented an increased brown adipose tissue (BAT) thermogenesis compared with CD4-Cre and littermates. This phenotype is similar to the observed in HFD-fed mice. Also, these results indicate that the same phenotype is observed when we compared with littermates including an extra control in the study.

      To further investigate the effect on energy metabolism, we utilized metabolic cages. The data from these experiments align with the increased thermogenesis observed in MKK3/6CD4-KO mice fed a chow diet, as they also demonstrated increased energy expenditure. We thank the reviewer for this suggestion as we believe that these new data strengthen our conclusion significantly.

      We have thoughtfully incorporated these essential findings into in Supplementary Figure 2C-D of the manuscript.

      1. While an increase in BAT temperature (as demonstrated here by infrared imaging) in line with increased thermogenesis, it will be critical to verify this hypothesis by indirect calorimetry. Energy expenditure, food intake, and activity measures should be added for regular and DIO mice. Please follow the guidelines for ANCOVA analysis and measurements explained in PMID: 22205519 and PMID: 21177944.

      We are grateful to Reviewer 1 for bringing up an essential point concerning the need to verify our hypothesis on increased BAT temperature and thermogenesis through indirect calorimetry. We acknowledge the importance of including energy expenditure, food intake, and activity measures for both regular and DIO mice to strengthen our study.

      To address this valuable suggestion, we have taken immediate action. We utilized metabolic cages in mice under chow diet. The data from these experiments align with the increased thermogenesis observed in MKK3/6CD4-KO mice fed a chow diet, as they also demonstrated increased energy expenditure, without differences in food intake or locomotor activity. We thank the reviewer for this suggestion as we believe that these new data strengthen our conclusion significantly. These new data are now in Supplementary Figure 2A-B.

      In addition, we have initiated a new experimental group of age-matched mice on HFD, which we will carefully feed for 8 weeks. Following this dietary period, we will subject the mice to metabolic cage analysis, allowing us to obtain accurate data on energy expenditure, food intake, and activity levels. These additional measurements will provide a comprehensive understanding of the metabolic changes induced by MKK3/6 deficiency in T cells under different dietary conditions.

      1. That the phenotype is still seen at isothermal housing is interesting but should be backed up by direct assessment of thermogenic capacity (see PMID: 21177944). In the end, it could also be increased heat loss, independently of heat production. If the browning is cause or consequence remains unclear, then.

      Thank you for raising this important point. Indeed, it is essential to corroborate the observed phenotype with direct assessments of thermogenic capacity to gain a comprehensive understanding of the underlying mechanisms. The study mentioned in PMID: 21177944 highlights the significance of evaluating thermogenesis directly to support the findings.

      According to your suggestion, we plan to house the animals at 30 ºC for four weeks and subsequently inject norepinephrine to evaluate thermogenesis capacity while measuring brown adipose tissue (BAT) activation. This approach should provide valuable insights into the thermogenic potential of the animals under isothermal conditions.

      However, we will not be able to conduct the experiment in metabolic cages at 30 ºC due to the constraint that our system does not allow 30 ºC temperature. For this reason, we will measure BAT temperature to analyze this experiment.

      1. Regarding the in vitro data, a thermogenic phenotype should be functionally verified by Seahorse analysis.

      We thank Reviewer 1 for raising an important point concerning the need for functional verification of the thermogenic phenotype observed in our in vitro data using Seahorse analysis.

      In response to this valuable suggestion, we performed Seahorse analysis in differentiated adipocytes treated with or without IL-35 for 48 hours. The results demonstrated a slight increase in basal metabolism and a heightened response to isoproterenol (ISO) stimulation of β3 adrenergic receptors in adipocytes after IL-35 treatment. These findings provide functional evidence supporting the thermogenic phenotype induced by IL-35 in adipocytes.

      We have thoughtfully included this essential data in Figure 2 of this revision plan, allowing reviewers and the scientific community to comprehensively evaluate and validate the functional implications of our findings.

      1. Mechanistically, there is epistasis type of experiment that IL-35 influences Ucp1 levels via ATF2 as the data remain associative in nature.

      Thank you for your valuable comment. We agree that to establish a mechanistic link between IL-35 and Ucp1 levels will improve the strength of the manuscript.

      To delve deeper into the mechanism through which IL-35 influences Ucp1 expression, we focused on the role of ATF2, a transcription factor known to be involved in regulating UCP1 levels (PMID: 11369767 and PMID: 15024092). In our investigation, we treated adipocytes with IL-35 both in the presence and absence of an inhibitor targeting the ATF2 pathway. The results were illuminating as we observed a significant reduction in the expression of Ucp1 when the ATF2 pathway was inhibited.

      These findings indicate that ATF2 is indeed a crucial mediator of the effects of IL-35 on Ucp1 levels. By inhibiting the ATF2 pathway, we demonstrate a direct functional link between IL-35 and the expression of Ucp1, providing mechanistic insights into the regulatory role of IL-35 in thermogenesis. We included new results in Figure 7F.

      1. What are other consequences of injecting IL-35? Is it good or bad? What is the therapeutic potential in DIO mice? Also, in these experiments (Fig. 7) indirect calorimetry as described would be supportive of the claims.

      Regarding the consequences of injecting IL-35, we have already performed experiments to analyze its effect. Our findings indicate that IL-35 increases thermogenesis in BAT (Figure 7), suggesting that it may play a role in promoting energy expenditure, which could be beneficial in combating diet-induced obesity (DIO) in mice. Importantly, we did not observe any negative effects of IL-35 in our experiments.

      Based on these promising results, we are expecting the therapeutic potential of IL-35 in DIO mice. By promoting thermogenesis in BAT, IL-35 may offer a novel approach to manage obesity and related metabolic disorders. However, we acknowledge that further comprehensive studies are needed to fully understand its therapeutic benefits and potential side effects.

      In our future works, we plan to evaluate a targeted delivery system for IL-35. We are currently generating IL-35 loaded metal-organic frameworks (MOFs) labeled with adipose tissue-specific peptides. This innovative strategy aims to enhance the delivery of IL-35 to adipose tissue, potentially maximizing its effects in the relevant areas. Our ongoing work with IL-35 loaded MOFs may offer a promising avenue for targeted delivery.

      Minor comments:

      1. The authors claim that their HFD-fed MKK3/6CD4-KO mice are protected against hyperglycemia, but only fasted/fed blood glucose tests are performed. Lower glucose levels could be explained due to a hyperinsulinemic state in response to growing insulin resistance in the presence of HFD. It would be sensible to perform both glucose and insulin tolerance tests to back up your statement.

      Thank you for your insightful comment. We agree that to support our claim of protection against hyperglycemia in HFD-fed MKK3/6CD4-KO mice, further tests are necessary beyond fasted/fed blood glucose measurements.

      In response to your suggestion, we conducted both glucose tolerance tests (GTT) and insulin tolerance tests (ITT) in HFD-fed MKK3/6CD4-KO mice. We did not observed differences in glucose tolerance and but ITT showed significantly enhanced insulin sensitivity compared to control mice. These findings provide evidence that the protection against hyperglycemia in HFD-fed MKK3/6CD4-KO mice is not solely due to a hyperinsulinemic state, but rather indicates genuine improvements in glucose handling and insulin response.

      We have thoughtfully included these crucial data in the revised version of the manuscript, both in the main text and Supplementary Figure 4. We extend our appreciation to the reviewer for this valuable suggestion, which has enhanced the scientific rigor and completeness of our study.

      1. Please provide the loading control for p38 and S6 blots (Figure 6G).

      Thank you for the comment. The loading control we used for P p38 and P S6 blots in Figure 6G is β-actin. Due to the limited amount of sample available, we can only use β-actin as the loading control. The sample amount obtained is very limited, and we can only provide enough lysate to run a couple of blots from the same sample. Running several western blots with the same sample is almost impossible given the constraint of the sample availability. We apologize for this limitation, but it is necessary to avoid using too many mice for ethical reasons, as the samples come from a large number of mice.

      1. Statistical test from Figure 7B should be a t-test, since it is only comparing 2 variables (PBS vs IL-35), and not a 2-way ANOVA as described in the legend.

      We sincerely thank the reviewer for the comment. It was indeed a mistake in the text. While we have performed a t-test, there was an error in the legend that we have now corrected. We apologize for any confusion this may have caused and appreciate the opportunity to rectify the oversight.

      1. Label correctly the panels in the figures -examples: Fig 3, panels C and D are interchanged; reference in the text to Fig S1G even though the figure only as panels A-F; Fig 7 legend referes to the statistical test of panel E when the figure only has A-D.

      We sincerely apologize for any mistakes in our manuscript that may have caused difficulties while reading the article and potentially led to misleading results. We are grateful to Reviewer #1 for bringing these errors to our attention. Thanks to their diligent review, we have been able to identify and rectify the issues in our manuscript. The necessary corrections have been made, ensuring the accuracy and reliability of our research. We greatly appreciate the reviewer's valuable feedback and contribution to improving the quality of our work.

      1. There are several typos along the text, please revise (example: page 4;line 4 -"tremorgenic")

      We apologize for the presence of any typos in the initial version of the article. We have thoroughly revised the manuscript to correct these errors. Thank you for bringing this to our attention and helping us improve the accuracy and clarity of our work.

      Reviewer #1 (Significance):

      The manuscript is well written, and the research conducted properly, even though a thorough analysis of energy metabolism in mice and cells is missing and the mechanistic claims are based on relatively thin data.

      The immune system and inflammation play important roles for obesity and insulin resistance, yet the roles they play in thermogenic adipocytes remains unclear. This work adds novel aspects to this relationship.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This manuscript by Nikolic et al sought to investigate the role of p38 activation in adipose tissue Treg cells and obesity. They found that the expression of p38a, its upstream kinase MKK6, and downstream substrate ATF2 was upregulated specifically in adipose T cells associated with human obesity. They generated T cell-specific knockout MKK3/6 in mice and found these animals were protected from diet-induced obesity as a result of increased BAT thermogenesis. Mechanistically, loss of p38a activation promoted adipose tissue accumulation of Treg cells, leading to elevated IL-35 availability and UCP1 expression.

      Major comments:

      1. They attributed the obesity protection to energy expenditure; however, food intake and intestinal absorption were never tested. Immune cells particularly Treg cells are important modulates of nutrient uptake.

      We are sincerely grateful to Reviewer #2 for this crucial comment, highlighting the importance of assessing not only energy expenditure but also food intake and intestinal absorption in our study.

      In response to this valuable suggestion, we have initiated an HFD experiment to comprehensively examine food intake and intestinal absorption. For food intake analysis, we are employing metabolic cages, which will allow us to monitor and quantify the amount of food consumed by the mice accurately. Additionally, we plan to follow the methodology outlined in the study by Kraus et al. (PMID: 27110587) to measure lipid content in feces, enabling us to evaluate intestinal absorption.

      By conducting these additional experiments, we aim to gain a deeper understanding of the potential role of Treg cells, known immune modulators of nutrient uptake, in our observed obesity protection phenotype.

      1. At thermoneutrality, BAT is inactive even though UCP1 expression is still present (not activated). MKK3/6 deficiency in T cells still confer protection against obesity at thermoneutrality suggests it regulates other energy balance components in addition to BAT thermogenesis.

      Thanks for the comment. We believe that the effects of IL35 on thermogenesis are likely partly mediated by alternative mechanisms, as we did not observe an increase in UCP1 gene expression in BAT in vivo (Figure 3D of the manuscript), and the increase in thermogenesis is still present even at thermoneutrality where UCP1 is inactive (Figure 4E of the manuscript). This suggests that IL35 might regulate other alternative pathways that control BAT thermogenesis.

      While our current findings provide valuable insights, further experiments may be necessary to fully understand the underlying mechanisms. For instance, conducting experiments with transgenic mice expressing IL35 or using IL35 knockout (KO) mice could shed more light on the specific pathways through which IL35 exerts its effects on thermogenesis and energy balance.

      In conclusion, we hypothesize that IL35's effects on thermogenesis are mediated partly by alternative mechanisms beyond UCP1 activation, and its ability to enhance thermogenesis even at thermoneutrality highlights its potential as a regulator of energy balance. We plan to further investigate the specific mechanisms through which IL35 impacts thermogenesis and energy balance. To achieve this, we will consider conducting experiments with transgenic mice expressing IL35 or using IL35 knockout (KO) mice in follow up studies. This is now discussed in our manuscript.

      1. Loss of adipose Treg cells (such as Pparg KO, Foxp3-DTR) did not lead to obvious obesity phenotypes. Gain-of-function Treg cells (such as adoptive transfer, IL-2/IL-2 Ab) did not results in profound obesity protection as observed in MKK3/6 CD4-KO mice. It suggests that MKK3/6 KO in T cells causes other immune defects (besides Tregs).

      We agree with the referee's assessment that the lack of obvious obesity phenotypes in above mentioned animal models. The results we observed in our MKK3/6CD4-KO mice suggest that p38 signaling pathway in T cells may modulate their function, leading to an upregulation of IL35 expression, which could be a contributing factor to the significant obesity protection observed in MKK3/6CD4-KO mice. We believe that IL35's effects on energy balance and thermogenesis are critical components of the observed protection against obesity in this model.

      Regarding the studies with PPAR KO in Treg cells, it is important to note that they did not specifically focus on the effect of thermogenesis. While they observed a general tendency of increased fat deposition when treated with a PPAR agonist in the Treg deficient PPAR KO mice, these findings were not extensively studied in that particular paper. Thus, additional research is necessary to specifically evaluate thermogenesis in these mice and further understand the role of PPAR in Treg-mediated thermogenic processes.

      We also acknowledge the presence of contradictory results from loss-of-function experiments of Treg cells in mice. The observed metabolic changes may be context-dependent, and the impact of Treg cells on metabolism might vary under different physiological conditions. For instance, in lean conditions where adipose tissue inflammation is low, a decrease in VAT Treg cells might not lead to significant metabolic changes. However, under certain circumstances, such as obesity, VAT Treg cells may play a critical role in regulating metabolism. In this context increasing that population that is reduced during obesity could results in improve metabolic performance.

      In conclusion, our findings suggest that the lack of p38 activation in Treg cells may prevent the dramatic down-regulation and loss of function observed in Treg cells during obesity. This preservation of Treg function could be a significant factor driving the observed protection against obesity in MKK3/6CD4-KO mice.

      While further studies are required to elucidate the precise timing and spatial aspects of the specific functions of adipose-resident Treg cells, it is evident that these cells play a crucial role in maintaining immune and metabolic homeostasis. They achieve this, in part, by regulating adipose inflammation, insulin sensitivity, lipolysis, and thermogenesis. This is now discussed in our manuscript.

      1. The increase in IL-35 seemed to be very moderate, compared to the metabolic phenotypes. It raises the question if IL-35 is responsible for BAT activation and reduced weight gain. It is unclear what systemic and local levels of IL-35 were reached after recombinant IL-35 treatment (Fig. 7B). IL-35 antibody blockade experiment in KO mice is recommended.

      Physiological changes in cytokines can indeed have a significant impact on the metabolic profile due to their continuous and intricate interactions. Even minor alterations in the overall cytokine milieu can result in substantial changes in metabolism (doi.org/10.1073/pnas.1215840110). In fact, it is well-established that in humans, small changes in cytokine profiles between genders, in obesity, and during aging can play a critical role in the development of pathology. These cytokines often operate in a chronic manner, exerting long-term effects on various physiological processes (doi.org/10.1038/s41467-020-14396-9).

      In summary, the dynamic interplay of cytokines in metabolism can lead to significant metabolic changes even with subtle alterations in their levels. While the increase in IL-35 may appear moderate, our findings using recombinant IL35 indicate that IL-35 increases thermogenesis in BAT, suggesting that it may play a role in promoting energy expenditure, which could be beneficial in combating diet-induced obesity (DIO) in mice. Importantly, we did not observe any negative effects of IL-35 in our experiments.

      1. IL-35 induced p-ATF2 is acute and transient (Fig. 7D) and it was able to increase BAT temperature in just 4 h (Fig. 7B). However, Ucp1 transcription and translation generally take much longer time (e.g. 2d in Fig. 7C). IL-35 may increase energy expenditure through UCP1-independent mechanisms.

      Thanks for the comment. As previously mentioned, we believe that the effects of IL35 on thermogenesis are might be mediated by alternative mechanisms, as we did not observe an increase in UCP1 gene expression in BAT, and the increase in thermogenesis is still present even at thermoneutrality where UCP1 is inactive. This suggests that IL35 might regulate other alternative pathways that control BAT thermogenesis.

      While our current findings provide valuable insights, further experiments may be necessary to fully understand the underlying mechanisms. For instance, conducting experiments with transgenic mice expressing IL35 or using IL35 knockout (KO) mice could shed more light on the specific pathways through which IL35 exerts its effects on thermogenesis and energy balance. We plan to further investigate the specific mechanisms through which IL35 impacts thermogenesis and energy balance. To achieve this, we will consider conducting experiments with transgenic mice expressing IL35 or using IL35 knockout (KO) mice in follow up studies. This is now discussed in our manuscript.

      Minor comments:

      1. The gating of Treg cells should exclude CD25- cells. Single positive (CD25+ or Foxp3+) cells are progenitors of Tregs. In addition to number, phenotypic activation of Treg cells should also be determined.

      Thank you for the comment. We have reanalyzed our data by excluding CD25- cells and included now in the figure 5A of the manuscript and new supplementary figure 7 of revised manuscript. We also checked CD69+ and KLRG1+ Treg cells and observed no differences between genotypes. We also included figures in this revision plan (Figure 5 and 6).

      1. ATF is also important for adipogenesis, is the adipogenic differentiation of BAT SVF cells affected by MKK3/6 KO or IL-35 treatment?

      We appreciate the reviewer's observation regarding the importance of ATF in adipogenesis. To investigate this aspect further, we performed in vitro differentiation of adipocytes and treated them with IL-35 in the presence or absence of an inhibitor targeting the upstream activator of ATF.

      The results were compelling, as IL-35 treatment led to an increase in the expression of adipogenic markers, including Pparg, Adipoq, Leptin, and Perilipin. In contrast, inhibiting ATF activation resulted in a reduction of these adipogenic markers. These findings provide strong evidence that ATF plays a significant role in mediating the effects of IL-35 on adipogenesis.

      We have thoughtfully included these essential data in Figure 7G of the manuscript. We extend our gratitude to the reviewer for their keen observation, which has enhanced the scientific depth and completeness of our study.

      1. Metabolic cage experiments are desired to determine whole-body energy balance, including food intake, physical activity, and heat production.

      To address this valuable suggestion, we have taken immediate action. We utilized metabolic cages in mice under chow diet. The data from these experiments align with the increased thermogenesis observed in MKK3/6CD4-KO mice fed a chow diet, as they also demonstrated increased energy expenditure, without differences in food intake or locomotor activity. We thank the reviewer for this suggestion as we believe that these new data strengthen our conclusion significantly. The new data are included in Supplementary figure 2 A-B.

      In addition, we have initiated a new experimental group of age-matched mice on HFD, which we will carefully feed for 8 weeks. Following this dietary period, we will subject the mice to metabolic cage analysis, allowing us to obtain accurate data on energy expenditure, food intake, and activity levels. These additional measurements will provide a comprehensive understanding of the metabolic changes induced by MKK3/6 deficiency in T cells under different dietary conditions.

      1. Total UCP1 expression (both RNA and protein) in the whole BAT from an animal should determined (since BAT is smaller in KO mice).

      Thank you for this comment. Yes, we have measured UCP1 expression in the whole BAT from the animals. It is in the figure 3C and 3D and here. Although in vitro studies indicated that IL35 increase UCP1 in adipocytes we were not able to find an increase of this protein in BAT

      We believe that the effects of IL35 on thermogenesis are likely partly mediated by alternative mechanisms, as we did not observe an increase in UCP1 gene expression in BAT in vivo, and the increase in thermogenesis is still present even at thermoneutrality where UCP1 is inactive (Figure 4E of the manuscript). This suggests that IL35 might regulate other alternative pathways that control BAT thermogenesis.

      1. Fig. 6C, IL-35-expressing Treg cells should be quantified from adipose tissue.

      We appreciate the referee's suggestion to quantify IL-35-expressing Treg cells from adipose tissue in Fig. 6C. While we agree that this would be valuable information, we encountered technical challenges that made it impractical to measure IL-35 directly in Treg cells from the visceral adipose tissue (VAT).

      One of the main technical challenges we encountered is the low number of Treg cells present in the adipose tissue, making it difficult to obtain sufficient cell material for accurate quantification of IL-35. Treg cells are relatively rare compared to other immune cell populations in the adipose tissue, and their extraction and analysis can be technically demanding.

      Reviewer #2 (Significance):

      The manuscript is innovative in define the novel role of p38 activation in the T cell compartment and its metabolic regulation. The involvement of Treg cells in adipose tissue homeostasis has been well documented and Treg cell-derived IL-35 has been demonstrated in immune regulation. The authors provided a relatively thorough description of the altered metabolism in these Mkk3/6 CD4-KO mice; however, the reviewer has doubts if Treg cells and IL-35 are primary mechanisms of the observed protection from obesity. The manuscript would be much stronger if the model were Treg cell-specific KO and/or IL-35 deficiency in Treg cells reverses obesity resistance conferred by MKK3/6 deficiency. It also suspected that BAT thermogenesis is not the major reason, as BAT deficiency or UCP1 KO results in much milder phenotypes in mice, even at thermoneutrality.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Specific comments:

      1. It's important to use proper controls for mouse metabolic studies. The authors stated that CD4-Cre and MKK3/6 CD4-KO mice are all in the C57B/6L background. However, it would appear that these two lines were bred separately. The difference in the genetic background, despite minor, can lead to the observed phenotype, notably weight gain. Since the metabolic phenotypes seem to be driven by the weight difference, it is even more critical to include additional controls to validate the findings. For instance, crossing MKK3/6 f/f with one copy of CD4-Cre with MKK3/6 f/f to generate age-matched MKK3/6 CD4-KO and MKK3/6 f/f controls should be used to repeat major in vivo studies similar to those in Fig. 2-4.

      We thank the reviewer for the comment. Although, every control is important using conditional mice, there are several papers indicating that all the cre expression lines have for their own effects that could be important in metabolism and there are several articles that strongly recommended to use cre+ lines as a control. For that reason, we have used the cre expressing line as a control because we really think is the best one (Jonkers and Berns, 2002). In fact, Jackson laboratory recommend to use cre expressing line as a control to avoid side effects that cre overexpression could have in the tissue of interest (https://biokamikazi.files.wordpress.com/2014/07/cre-lox-imp-notes.pdf).

      However, as this reviewer suggested, we checked that similar results were obtained using littermates as controls and we have now included these data in the manuscript (Supplementary Figure 2D).

      1. The assessment of adipose tissue immune cell population in Fig. 5 was conducted after HFD-induced obesity. As mentioned above, the change in Treg and M2 cell percentage could be due to the body weight difference. The experiment should be repeated (with proper controls) in normal chow and after a few weeks of HFD when Treg numbers start to decline.

      Thank you for the comment. We currently performing short HFD experiment to check Treg and M2 cell population in adipose tissue using the littermates as controls.

      In addition, we checked those cell populations in adipose tissue infiltrates in mice fed chow diet and observed no differences in M2 macrophage population between mice, while the percentage of Treg cells was actually lower in MKK3/6CD4-KO mice ND-fed mice (Fig 12 of revision plan). This result suggests that higher accumulation of Treg cells in mice lacking p38 activation in T cells are specific of obese state and strengthen our hypothesis that DIO protection in MKK3/6CD4-KO mice is due to Treg cell population.

      1. Data related to the mechanistic link in Fig. 6/7 are not robust and require a large amount of additional work to substantiate the claim. First of all, the role of IL-35 in BAT thermogenesis remains unclear. It's somewhat surprising to see a single dose of IL-35 i.v. injection is sufficient to increase BAT temperature in Fig. 7B. Minimally, the authors need to demonstrate that IL-35 treatment (perhaps after a few daily doses) is able to increase browning/beiging of fat cells and improve cold tolerance when placing the mice at 4 degree of several hours (and up to 3 days). Serum FGF21 level should also be measured after/during IL-13 treatment. Secondly, ATF2 knockout or knockdown in brown preadipocytes should be employed to demonstrate that IL-35 induced UCP1 and FGF21 expression is ATF2 dependent. Another key experiment is to use IL-35 deficient Treg model to definitively demonstrate the requirement of Treg IL-35 to maintain thermogenesis. However, this can be done in a follow up study.

      We are grateful for all the insightful comment provided by Reviewer #3. We understand the concern, but we have the limitations in performing several sequential i.v. injections in our animal facility due to ethical permissions. In light of this constraint, we have devised an alternative approach to evaluate the role of IL-35 in adaptive thermogenesis.

      To address this, we conducted a cold tolerance test in both control mice and MKK3/6CD4-KO mice, which express higher levels of IL-35. Our findings revealed that MKK3/6CD4-KO mice exposed to cold conditions were able to preserve their body and brown adipose tissue (BAT) temperature, while the temperature of control CD4-Cre mice gradually dropped during the cold challenge.

      The data from this cold tolerance test support our hypothesis and demonstrate the role of IL-35 in promoting adaptive thermogenesis, leading to enhanced temperature maintenance in MKK3/6CD4-KO mice. These observations have been included in Figure 7B of the manuscript, and detailed results are available in Figure 11 of this revision plan.

      We appreciate the reviewer's valuable input, which has encouraged us to explore alternative experimental approaches to address the research question effectively.

      We agree with the reviewer #3 that using IL-35 deficient Treg model would be great approach to confirm our results, but we think that now with the additional experiments we have performed, we strength our findings that IL-35 has a novel role in controlling adipose tissue thermogenesis.

      Reviewer #3 (Significance):

      Dissipating energy as heat through brown or beige adipocyte-mediated thermogenesis is believed to be an effective way to combat obesity. The current study aims to characterize the p38 signaling pathway in T cells as a potential target to modulate browning or beiging of adipose tissues. This would be of interest to the basic biomedical research community, particularly in the area of immunometabolism. A major limitation is the concern of improper controls for the mouse models, which makes data interpretation difficult. In addition, the mechanistic studies lack in depth analyses to support the conclusion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for the constrictive and detailed feedback provided. We have adopted the proposed changes to improve the manuscript clarity and accessibility. The following revisions are included in the revised manuscript:

      Reviewer #1 (Public Review):

      The analytical framework is not sufficiently explained in the main text.

      We think the reviewer is referring to the conceptual framework mentioned in introduction. In the previously submitted manuscript, we did not provide details because the framework is published elsewhere. However, we agree with the reviewer that a short explanation may be helpful, which we have included in the resubmitted manuscript.

      The significance of findings in relation to functional changes is not clear. What are the consequences of enrichment of RNA transport or ribosome biogenesis pathways between pesticides and recovery stages, for example?

      We thank the reviewer for this suggestion. In the previously submitted manuscript, we included an explanation of the central functions these pathways can alter (e.g. metabolism and infection response). These functions are self-explanatory. However, we have elaborated on the consequence that the disruption of these pathways can cause in the resubmitted manuscript.

      The impact of individual biocides and climate variables, and their additive effects, are assessed but there is no information offered on non-additive interactions (e.g., synergistic, antagonistic).

      This was a misunderstanding based on our use of the term synergistic in this context. The approach by which we define a synergistic or joint effect of two environmental variables on a taxonomic group is explained in the methods section. This analysis is based on climate variables and biocide types contributing the largest covariances in the correlation analysis explained in Supplementary Fig. 5; Step 4. The combined effect of two environmental variables on a taxon was considered to be significant if the biocide type and the climate variable were each significantly correlated with the taxon over the same time window, and their average Pearson correlation was > 0.5 with padj < 0.05 (SWC analysis with 10,000 permutations). The biocide type and the climate variable were interpreted to have a joint effect on a given taxon if the linear combination of the biocide type and the climate variable had a larger Pearson correlation coefficient than each of the correlations between the family and the biocide type and the family and the climate variable individually, in the same time interval with padj < 0.05 (with 10,000 permutations in the SWC analysis). We realise that the use of synergistic or additive was not correct in this context and have replaced the term synergistic with joint effect throughout the manuscript.

      The level of confidence associated with results is not made explicit. The reader is given no information on the amount of variability involved in the observations, or the level of uncertainty associated with model estimates.

      As we didn’t use traditional statistical approaches, confidence level estimation in the traditional sense is not possible. Instead, we used permutation tests and adjusted P-values to identify significant correlations in the data. These approaches are more robust than traditional statistics for integrating and discovering complex, group-wise patterns among high-dimensional datasets. While most forms of machine learning require large sample sizes, sCCA uses fewer observations to identify the most correlated components among data matrices and captures the multivariate variability of the most important features.

      The major implications of the findings for regulatory ecological assessment are missed. Regulators may not be primarily interested in identifying past "ecosystem shifts". What they need are approaches which give greater confidence in monitoring outcomes by better reflecting the ecological impact of contemporary environmental change and ecosystem management. The real value of the work in this regard is that: (1) it shows that current approaches are inappropriate due to the relatively stable nature of the indicators used by regulators, despite large changes in pollutant inputs; (2) it presents some better alternatives, including both taxonomic and functional indicators; and (3) it provides a new reference (or baseline) for regulators by characterizing "semi-pristine" conditions.

      We thank the reviewer for this suggestion, which we have included in the main text (L451461)

      Reviewer #2 (Public Review):

      Results - They are brief and should expand some more. Particularly, there are no results regarding metabarcoding data (number of reads, filtering etc.). These details are important to know the quality of the data which represents the bulk of the analyses. Even the supplementary material gives little information on the metabarcoding results (e.g. number of ASVs - whether every ASV of each family were pooled etc.).

      We thank the reviewer for this suggestion. We have included a paragraph in results reporting read numbers and other statistics. The filtering criteria and handling of samples can be found in methods (L658-661; L670-675). As explained in methods the taxonomy was assigned using qiime feature-classifier classify-sklearn and used at family level where possible. When classification was not possible at family level because of incomplete/missing information in the online database or a poor match to reference database, the lowest classification possible was used.

      The drivers of biodiversity change section could be restructured and include main text tables showing the families positively or negatively correlated with the different variables (akin to table S2 but simplified).

      As there are over 180 unique families/taxonomic units correlated with at least one biocide or environmental variable, a simplified version of this table would be too large to include in the main text. Therefore, we prefer to keep this information in supplementary table 2 complete with correlation statistics.

      We thank the reviewers for providing detailed feedback on the manuscript and respond to their suggestions as follows:

      Reviewer #1 (Recommendations For The Authors):

      Thank you for the opportunity to review your manuscript, which I found interesting and enjoyable to read. Here are some suggestions for improving it.

      Remove spaces before citations in text.

      Lines 51-53: "Community-level biodiversity reliably explained freshwater ecosystem shifts whereas traditional quality indices (e.g. Trophic Diatom Index) and physicochemical parameters proved to be poor metrics for these shifts." Seems to be the wrong way around / not clear???

      Rephrased to clarify.

      Line 54: Should be "...advocates the use of..." or "...demonstrates the advantages of..."

      Done, thanks for the suggestion.

      Line 62: Spell out numbers <10, i.e. "sixth mass extinction"

      Done, thank you.

      Lines 66-72: These sentences lack clarity. It's not clear that "experimental manipulation of biodiversity" hasn't involved investigation of "multi-trophic changes". By the third of these four sentences it is not clear what "they" is referring to. And in the fourth sentence, "these holistic studies" are not defined. Perhaps it would suffice to say that experiments have so far focused primarily on a single trophic level and largely neglected freshwater systems.

      We have rephrased to improve clarity.

      Line 81: Delete unnecessary bracket

      Done, thank you.

      Line 82: "a minority of freshwater ecosystems" sounds as if you're saying that few freshwater ecosystems are represented in BioTIME, which seems obvious and would also apply to terrestrial and marine systems. Do you mean that freshwater ecosystems re not well represented in the data?

      We have clarified the sentence, thanks.

      Line 106: Resolve issue with citation in text at the end of the sentence (repeated at line 109 and possibly other lines).

      Done, thank you.

      Line 116: By ">1999s" do you mean 1990s?

      This was a typo. it was supposed to be >1999

      Line 120: The reader would benefit greatly from a brief explanation of explainable network models and multimodal learning in the introduction. Why are these the right tools to use? How do they work in this context? Figure 1 helps to some extent but needs more commentary in the text.

      We have included an explanation of the explainable network models and multimodal learning and how their use can be beneficial to the study of diverse data types.

      Line 144: Here and throughout the text the language could be much more efficient and readable. "Alpha diversity" does not require a definite article. Furthermore, when referring to significance it is convention to state the p-value, test statistic and test used.

      As there are different p-values for each barcode, we have included them in legend to Supplementary Fig. 1 to avoid crowding the main text. We prefer to leave the text unchanged for this reason.

      Line 155: "The primary producer's composition" is grammatically awkward and less suitable than "the composition of primary producers". This kind of awkwardness occurs again at line 285 ("diatom's") and possibly in other parts of the manuscript.

      Thanks, corrected.

      Line 169: The statement that this family was "relatively more abundant" needs a little more explanation. What is it relative to - other groups or to previous stages?

      More abundant than in the other phases – the sentence has been modified.

      Line 179: Nested brackets are unnecessary and affect readability. This could simply be a new sentence, i.e. "For example, Nitrospiraceae (nitrite oxidizers)..."

      Done, thanks.

      Line 215: "Functional biodiversity", which implies that some biodiversity is functional and some not, does not seem an appropriate term to describe the results you present in this section. Simply "functioning of the prokaryotic community" would suffice.

      Thanks, done.

      Line 214-233: This section may be inaccessible for many readers. For example, what are Kegg Orthologs and what role do they play in the functioning of a lake ecosystem? The explanation comes later in the paragraph but there needs to be a gentler introduction before diving into specific technical concepts.

      We appreciate this comment and have included a short explanation of what KEGG and KO terms mean.

      Supplementary Figure 3: It would be helpful to superimpose the lake stages here, as done in Figure 2.

      The figure has been updated with coloured data points corresponding to each phase, as in supplementary figure 1.

      Line 265: Should be "19 of which were identified..."

      Done, thanks.

      Line 284: "Predominantly" rather than "prominently"?

      Done

      Line 242-316: This section is good in that it identifies and ranks individual biocides and climate variables but there is no information on non-additive interactions (e.g., synergistic, antagonistic). Could the authors at least comment on why this was not done or not necessary, and what uncertainties this omission could introduce into the results?

      This was a misunderstanding based on our use of the term synergistic in this context. the approach by which we define a synergistic or joint effect of two environmental variables on a taxonomic group is explained in the methods section. This analysis is based on climate variables and biocide types contributing the largest covariances in the correlation analysis explained in Supplementary Fig. 5; Step 4. The combined effect of two environmental variables on a taxon was considered to be significant if the biocide type and the climate variable were each significantly correlated with the taxon over the same time window, and their average Pearson correlation was > 0.5 with padj < 0.05 (SWC analysis with 10,000 permutations) – this is shown in Supplementary Fig. 5; Step 6. The biocide type and the climate variable were interpreted to have an additive effect on a given taxon if the linear combination of the biocide type and the climate variable had a larger Pearson correlation coefficient than each of the correlations between the family and the biocide type and the family and the climate variable individually, in the same time interval with padj < 0.05 (with 10,000 permutations in the SWC analysis). we have replace synergistic with joint effect to avoid confusion.

      Figure 4: These 3-D plots are very hard to read. Without additional features (e.g. shadows on each plane, or lines connecting points to planes) it is impossible for the viewer to tell where the points are located on each axis.

      We have created interactive 3D plots here: https://environmental-omicsgroup.github.io/Biodiversity_Monitoring/.

      Figure 5: Legend entry should be "summer precipitation" not "precipitations". "Additive effect" rather than "joint effect" would be more consistent with the main text.

      “Precipitations” has been updated to “precipitation” where relevant throughout. We left ‘joint effect’ and unified the main text, responding to a previous comment of this reviewer on the meaning of synergistic effects in our study.

      Line 348: Doesn't your approach also require specialist skills? I often feel that the "traditional" versus "molecular" monitoring debate misses this point. Some comment on the training and development needs for those interested in applying the sedaDNA approach would be welcome. Otherwise it is an unfair comparison.

      Whereas the application of high throughput sequencing technologies requires training, these technologies are well established with publicly available standard operating procedures. As compared to direct observations, high throughput sequencing provides replicable results regardless of the operator. Moreover, the application of metabarcoding to sedaDNA or more generally eDNA can be outsourced to established environmental services, removing the need for training if it is a limiting factor. The above has been included in discussion.

      Line 391: "Significantly did" what? "Did significantly change over time" would be better.

      Done, thanks.

      Line 407: Should be "an indicator of..." and "did not significantly change over time..."

      Done, thanks.

      Line 408-410: Regulators are not necessarily interested in identifying past "ecosystem shifts", so this does not seem to be the best way to contrast the capabilities of the sedaDNA approach with those of LTDI2. The real value of this work, in my opinion, is threefold. First, it shows that the reliance on diatoms as indicators of ecological status is inappropriate due to the relatively stable nature of diatom communities in the face of large environmental changes. Second, it presents some better alternatives, including both taxonomic and functional indicators. And third, it provides a new reference point for regulators by characterising "semi-pristine" conditions.

      Thanks for the insightful suggestion. We agree with the reviewer on the advantages and have spelled them out in the resubmitted manuscript.

      Line 445: What are "housekeeping functions"? I checked the Cuenca-Cambronero paper cited but did not find the term there.

      Housekeeping functions are essential basic cellular functions that are evolutionary conserved. They are more commonly present in public databases because they have been characterised in a number of model species (e.g. Drosophila, C. elegans and Mus musculus). Our reference it not to the Cuenca-Cambronero paper, but to Mi et al, describing the reference database PANTHER. We included the definition of housekeeping functions in the main text.

      Line 449: Briefly state the main functional changes found here.

      Examples have been included.

      Lines 451-452: Whilst this statement may be found in the cited source, most readers I suspect would not identify with it. Indeed, one could argue that most of freshwater ecology has been dedicated to this very task (documenting chemical impacts on biodiversity)! A more balanced view is needed here.

      The sentence the reviewer refers to includes also reference to climate change. Climate change and chemical pollution are the two most common causes of biodiversity loss, and not only in freshwater ecosystems.

      Lines 463-466: These examples both point to non-additive (synergistic) effects, which were not assessed in the current study.

      Please refer to our explanation above about the inappropriate use of synergistic and, here, additive. We have altered the text throughout to use joint effects as we do not investigate synergistic, antagonistic and additive effects as traditionally described in ecology.

      Lines 472-474: This sentence is unclear. Do you mean that this approach surpasses others in terms of reliability? If so, I don't believe this has been demonstrated in the paper.

      We apologise. The word ‘reliability’ should have not been in the text. We have improved the clarity of this sentence.

      Lines 474-482: In these sentences it is unclear whether or not you are talking about your method or contrasting it with another method(s). If the latter, which method or methods are you referring to?

      We have fixed this sentence to better reflect that our algorithm provides a high degree of confidence that surpasses state-of-the-art analysis, which predominantly identify patterns of co-occurrence of taxa within communities (e.g. Correlation-Centric Network).

      Line 631: Should be "Physico-chemical variables". I have not extensively checked the rest of the methods for such errors.

      Thank you, the text has been changed where present.

      Reviewer #2 (Recommendations For The Authors):

      Introduction Line 80 remove extra ')'

      Done, thank you.

      Line 81 rephrase e.g includes few freshwater ecosystems

      We modified this sentence also following Reviewer #1

      Line 83 although, instead of whereas?

      Done, thanks.

      Line 106 formatting reference issue

      Line 109 same as above

      Thank you, noted.

      Results

      Line 141 - 144 how was the sampling of the sediment performed over the 100 year core? Every year? Every 5 years? Or were they pooled to represent the (as of yet unlisted) phases?

      The reviewer is correct that details are not provided here. They are in methods. We have added some text to explain the basic concepts of how the core was obtained and sliced and refer the reader to the method section for more details.

      Line 154 the authors have not yet explicitly listed the lake phases, so it is difficult to refer to them now.

      Noted, the addition of a short explanation at the beginning of the results section should take care of this issue.

      Line 216 - may be worth briefly explaining KEGG orthologs and how these relate to functional biodiversity.

      We thank the reviewer. Also responding to a similar comment from Reviewer #1, we included a description of KO terms and their links to functional biodiversity.

      Lines 249 - 260 instead of a supplementary table, it could remain in the main text

      Supplementary table 2 is a multi-tab table including information for each region amplified here. It is not possible to include this table in the main text.

      Materials and Methods Due to the formatting of the manuscript (results & discussion before materials and methods), many of the results are not clearly understood without having to visit the M&M section. Particularly, how the biocide types were obtained (Historic records plus persistence of DDT in sediments). This could be resolved y including a few sentences on how the data was gathered in the results section. Overall, materials and methods are sufficient, however, it is not clear how many of the 37 metabarcoding samples correspond to which of the lake phases. Finally, I suggest a better organization of M&Ms by having subheadings for each section. For example, under Biodiversity fingerprinting across 100 years, one subheading could de DNA extraction and sequencing, another subheading could be bioinformatics.

      We thank the reviewer for the suggestion. To alleviate the issues linked to the methods section coming after the results section, we have introduced a short explanation of the sediments core and the lake phases at the beginning of the results section. A description of the climate and chemical data has been included at the beginning of the section ‘Drivers of biodiversity change’ in results. Subheadings were introduced in methods as suggested.

    1. Author Response

      Reviewer #1 (Public Review):

      .In the best genetically and biochemically understood model of eukaryotic DNA replication, the budding yeast, Saccharomyces cerevisiae, the genomic locations at which DNA replication initiates are determined by a specific sequence motif. These motifs, or ARS elements, are bound by the origin recognition complex (ORC). ORC is required for loading of the initially inactive MCM helicase during origin licensing in G1. In human cells, ORC does not have a specific sequence binding domain and origin specification is not specified by a defined motif. There have thus been great efforts over many years to try to understand the determinants of DNA replication initiation in human cells using a variety of approaches, which have gradually become more refined over time.

      In this manuscript Tian et al. combine data from multiple previous studies using a range of techniques for identifying sites of replication initiation to identify conserved features of replication origins and to examine the relationship between origins and sites of ORC binding in the human genome. The authors identify a) conserved features of replication origins e.g. association with GC-rich sequences, open chromatin, promoters and CTCF binding sites. These associations have already been described in multiple earlier studies. They also examine the relationship of their determined origins and ORC binding sites and conclude that there is no relationship between sites of ORC binding and DNA replication initiation. While the conclusions concerning genomic features of origins are not novel, if true, a clear lack of colocalization of ORC and origins would be a striking finding.

      Thank you. That is where the novelty of the paper lies.

      However, the majority of the datasets used do not report replication origins, but rather broad zones in which replication origins fire. Rather than refining the localisation of origins, the approach of combining diverse methods that monitor different objects related to DNA replication leads to a base dataset that is highly flawed and cannot support the conclusions that are drawn, as explained in more detail below.

      We are using the narrowly defined SNS-seq peaks as the gold standard origins and making sure to focus in on those that fall within the initiation zones defined by other methods. The objective is to make a list of the most reproducible origins. Unlike what the reviewer states, this actually refines the dataset to focus on the SNS origins that have also been reproduced by the other methods in multiple cell lines. We will change the last box of Fig. 1A to say: Identify reproducible SNS-seq origins that are contained in IZs defined by Repli-seq, OK-seq and Bubble-seq. These are the “shared origins”. This and the Fig. 2B (as it is) will make our strategy clearer.

      Methods to determine sites at which DNA replication is initiated can be divided into two groups based on the genomic resolution at which they operate. Techniques such as bubble-seq, ok-seq can localise zones of replication initiation in the range ~50kb. Such zones may contain many replication origins. Conversely, techniques such as SNS-seq and ini-seq can localise replication origins down to less than 1kb. Indeed, the application of these different approaches has led to a degree of controversy in the field about whether human replication does indeed initiate at discrete sites (origins), or whether it initiates randomly in large zones with no recurrent sites being used. However, more recent work has shown that elements of both models are correct i.e. there are recurrent and efficient sites of replication initiation in the human genome, but these tend to be clustered and correspond to the demonstrated initiation zones (Guilbaud et al., 2022).

      These different scales and methodologies are important when considering the approach of Tian et al. The premise that combining all available data from five techniques will increase accuracy and confidence in identifying the most important origins is flawed for two principal reasons. First, as noted above, of the different techniques combined in this manuscript, only SNS-seq can actually identify origins rather than initiation zones. It is the former that matters when comparing sites of ORC binding with replication origin sites if a conclusion is to be drawn that the two do not co-localise.

      Exactly. So the reviewer should agree that our method of finding SNS-seq peaks that fall within initiation zones actually refines the origins to find the most reproducible origins. We are not losing the spatial precision of the SNS-seq peaks.

      Second, the authors give equal weight to all datasets. Certainly, in the case of SNS-seq, this is not appropriate. The technique has evolved over the years and some earlier versions have significantly different technical designs that may impact the reliability and/or resolution of the results e.g. in Foulk et al. (Foulk et al., 2015), lambda exonuclease was added to single stranded DNA from a total genomic preparation rather than purified nascent strands), which may lead to significantly different digestion patterns (ie underdigestion). Curiously, the authors do not make the best use of the largest SNS-seq dataset (Akerman et al., 2020) by ignoring these authors separation of core and stochastic origins. By blending all data together any separation of signal and noise is lost. Further, I am surprised that the authors have chosen not to use data and analysis from a recent study that provides subsets of the most highly used and efficient origins in the human genome, at high resolution (Guilbaud et al., 2022).

      1) We are using the data from Akerman et al., 2020: Dataset GSE128477 in Supplemental Table 1. We can examine the core origins defined by the authors to check its overlap with ORC binding.

      2) To take into account the refinement of the SNS-seq methods through the years, we actually included in our study only those SNS-seq studies after 2018, well after the lambda exonuclease method was introduced. Indeed, all 66 of SNS-seq datasets we used were obtained after the lambda exonuclease digestion step. To reiterate, we recognize that there may be many false positives in the individual origin mapping datasets. Our focus is on the True positives, the SNS-seq peaks that have some support from multiple SNS-seq studies AND fall within the initiation zones defined by the independent means of origin mapping (described in Fig. 1A and 2B). These True positives are most likely to be real and reproducible origins and should be expected to be near ORC binding sites.

      We will change the last box of Fig. 1A to say: Identify reproducible SNS-seq origins that are contained in IZs defined by Repli-seq, OK-seq and Bubble-seq. These are the “Shared origins”.

      Ini-seq by Torsten Krude and co-workers (Guillbaud, 2022) does NOT use Lambda exonuclease digestion. So using Ini-seq defined origins is at odds with the suggestion above that we focus only on SNS-seq datasets that use Lambda exonuclease. However, Ini-seq identifies a much smaller subset of SNS-seq origins, so we will do the analysis with just that smaller set in the revision of the paper.

      References:

      Akerman I, Kasaai B, Bazarova A, Sang PB, Peiffer I, Artufel M, Derelle R, Smith G, Rodriguez-Martinez M, Romano M, Kinet S, Tino P, Theillet C, Taylor N, Ballester B, Méchali M (2020) A predictable conserved DNA base composition signature defines human core DNA replication origins. Nat Commun, 11: 4826

      Foulk MS, Urban JM, Casella C, Gerbi SA (2015) Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res, 25: 725-735

      Guilbaud G, Murat P, Wilkes HS, Lerner LK, Sale JE, Krude T (2022) Determination of human DNA replication origin position and efficiency reveals principles of initiation zone organisation. Nucleic Acids Res, 50: 7436-7450

      Reviewer #2 (Public Review):

      Tian et al. perform a meta-analysis of 113 genome-wide origin profile datasets in humans to assess the reproducibility of experimental techniques and shared genomics features of origins. Techniques to map DNA replication sites have quickly evolved over the last decade, yet little is known about how these methods fare against each other (pros and cons), nor how consistent their maps are. The authors show that high-confidence origins recapitulate several known features of origins (e.g., correspondence with open chromatin, overlap with transcriptional promoters, CTCF binding sites). However, surprisingly, they find little overlap between ORC/MCM binding sites and origin locations.

      Overall, this meta-analysis provides the field with a good assessment of the current state of experimental techniques and their reproducibility, but I am worried about: (a) whether we've learned any new biology from this analysis; (b) how binding sites and origin locations can be so mismatched, in light of numerous studies that suggest otherwise; and (c) some methodological details described below.

      Major comments:

      Line 26: "0.27% were reproducibly detected by four techniques" -- what does this mean? Does the fragment need to be detected by ALL FOUR techniques to be deemed reproducible?

      If the reproducible SNS-seq peaks are included in the reproducible initiation zones found by the other methods, then we consider it reproducible across datasets. The strategy is to focus our analysis on the most reproducible SNS-seq peaks that happen to be in reproducible initiation zones. It is the best way to confidently identify a very small set of true positive origins.

      And what if the technique detected the fragment is only 1 of N experiments conducted; does that count as "detected"?

      A reproducible SNS-seq origin has been reproduced above a statistical threshold of 20 reproductions. A threshold of reproduction in 20 datasets out of 66 SNS-seq datasets gives an FDR of <0.1. This is explained in Fig. 2a and Supplementary Fig. S2. For the initiation zones, we considered a Zone even if it appears in only 1 of N experiments, because N is usually small. This relaxed method for selecting the initiation zones gives the best chance of finding SNS-seq peaks that are reproduced by the other methods.

      Later in Methods, the authors (line 512) say, "shared origins ... occur in sufficient number of samples" but what does sufficient mean?

      Sufficient means that SNS-seq origin was reproducibly detected in ≥ 20 datasets and was included in any initiation zone defined by three other techniques.

      Then on line 522, they use a threshold of "20" samples, which seems arbitrary to me. How are these parameters set, and how robust are the conclusions to these settings? An alternative to setting these (arbitrary) thresholds and discretizing the data is to analyze the data continuously; i.e., associate with each fragment a continuous confidence score.

      We explained Fig. 2a and Supplementary Fig. S2 in the text as follows: The occupancy score of each origin defined by SNS-seq (Supplementary Fig. 2a) counts the frequency at which a given origin is detected in the datasets under consideration. For the random background, we assumed that the number of origins confirmed by increasing occupancy scores decreases exponentially (see Methods and Supplementary Table 2). Plotting the number of origins with various occupancy scores when all SNS-seq datasets published after 2018 are considered together (the union origins) shows that the experimental curve deviates from the random background at a given occupancy score (Fig. 2a). The threshold occupancy score of 20 is the point where the observed number of origins deviates from the expected background number (with an FDR < 0.1) (Fig. 2a). In the Methods: In other words, the number of observed origins with occupancy score greater than 20 is 10 times more than expected in the background model. This approach is statistically sound and described by us in (Fang et al. 2020).

      Line 20: "50,000 origins" vs "7.5M 300bp chromosomal fragments" -- how do these two numbers relate? How many 300bp fragments would be expected given that there are ~50,000 origins? (i.e., how many fragments are there per origin, on average)? This is an important number to report because it gives some sense of how many of these fragments are likely nonsense/noise. The authors might consider eliminating those fragments significantly above the expected number, since their inclusion may muddle biological interpretation.

      I think we confused the reviewer by the way we wrote the abstract. The 50,000 origins that are mentioned in the abstract is the hypothetical expected number of origins that have to fire to replicate the whole 6x10^9 base diploid genome based on the average inter-origin distance of 10^5 bases (as determined by molecular combing). The 7.5M 300 bp fragments are the genomic regions where the 7.5M union SNS-seq-defined origins are located. Clearly, that is a lot of noise, some because of technical noise and some due to the fact that origins fire stochastically. Which is why our paper focuses on a smaller number of reproducible origins, the 20,250 shared origins. Our analysis is on the 20,250 shared origins, and not on all 7.5M union origins. Thus, we are not including the excess of non-reproducible (stochastic?) origins in our analysis.

      The revised abstract in the revised paper will say: “Based on experimentally determined average inter-origin distances of ~100 kb, DNA replication initiates from ~50,000 origins on human chromosomes in each cell-cycle. The origins are believed to be specified by binding of factors like the Origin Recognition Complex (ORC) or CTCF or other features like G-quadruplexes. We have performed an integrative analysis of 113 genome-wide human origin profiles (from five different techniques) and 5 ORC-binding site datasets to critically evaluate whether the most reproducible origins are specified by these features. Out of ~7.5 million union origins identified by 66 SNS-seq datasets, only 0.27% were reproducibly contained in initiation zones identified by three other techniques (20,250 shared origins), suggesting extensive variability in origin usage and identification in different circumstances.”

      Line 143: I'm not terribly convinced by the PCA clustering analysis, since the variance explained by the first 2 PCs is only ~25%. A more robust analysis of whether origins cluster by cell type, year etc is to simply compute the distribution of pairwise correlations of origin profiles within the same group (cell type, year) vs the correlation distribution between groups. Relatedly, the authors should explain what an "origin profile" is (line 141). Is the matrix (to which PCA is applied) of size 7.5M x 113, with a "1" in the (i,j) position if the ith fragment was detected in the jth dataset?

      The reviewer is correct about how we did the PCA and have now included the description in the Methods. We will also do the pairwise correlations the way the reviewer suggests (a) by techniques, (b) by cell types (SNS-seq), (c) by year of publication (SNS-seq).

      It's not clear to me what new biology (genomic features) has been learned from this meta-analysis. All the major genomic features analyzed have already been found to be associated with origin sites. For example, the correspondence with TSS has been reported before:

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6320713/

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547456/

      So what new biology has been discovered from this meta-analysis?

      The new biology can be summarized as: (a) We can identify a set of reproducible (in multiple datasets and in multiple cell lines) SNS-seq origins that also fall within initiation zones identified by completely independent methods. These may be the best origins to study in the midst of the noise created by stochastic origin firing. (b) The overlap of these True Positive origins with known ORC binding sites is tenuous. So either all the origin mapping data, or all the ORC binding data has to be discarded, or this is the new biological reality in mammalian cancer cells: on a genome-wide scale the most reproduced origins are not in close proximity to ORC binding sites, in contrast to the situation in yeast. (c) All the features that have been reported to define origins (CTCF binding sites, G quadruplexes etc.) could simply be from the fact that those features also define transcription start sites (TSS), and origins prefer to be near TSS because of the favorable chromatin state.

      Line 250: The most surprising finding is that there is little overlap between ORC/MCM binding sites and origin locations. The authors speculate that the overlap between ORC1 and ORC2 could be low because they come from different cell types. Equally concerning is the lack of overlap with MCM. If true, these are potentially major discoveries that butts heads with numerous other studies that have suggested otherwise. More needs to be done to convince the reader that such a mis-match is true. Some ideas are below:

      Idea 1) One explanation given is that the ORC1 and ORC2 data come from different cell types. But there must be a dataset where both are mapped in the same cell type. Can the authors check the overlap here? In Fig S4A, I would expect the circles to not only strongly overlap but to also be of roughly the same size, since both ORC's are required in the complex. So something seems off here.

      We agree with the reviewer that there is something “off here”. Either the techniques that report these sites are all wrong, or the biology does not fit into the prevailing hypothesis. One secret in the ORC ChIP field that our lab has struggled with for quite some time is that the various ORC subunits do not necessarily ChiP-seq to the same sites. The poor overlap between the binding sites of subunits of the same complex either suggests that the subunits do not always bind to the chromatin as a six-subunit complex or that all the ChIP-seq data in the Literature is suspect. We provide in the supplementary figure S4A examples of true positive complexes (SMARCA4/ARID1A, SMC1A/SMC3, EZH2/SUZ12), whose subunits ChIP-seq to a large fraction of common sites. As shown in Supplementary Fig. S4C, we do not have ORC1 and ORC2 ChIP-seq data from the same cell-type. We have ORC1 ChIP-seq and SNS-seq data from HeLa cells and ORC2 ChIP seq and origins from K562 cells, and so will add the proximity/overlap of the binding sites to the origins in the same cell-type in the revision.

      Idea 2) Another explanation given is that origins fire stochastically. One way to quantify the role of stochasticity is to quantify the overlap of origin locations performed by the same lab, in the same year, in the same experiment, in the same cell type -- i.e., across replicates -- and then compute the overlap of mapped origins. This would quantify how much mis-match is truly due to stochasticity, and how much may be due to other factors.

      A given lab may have superior reproducibility compared to the entire field. But the notion of stochasticity is well accepted in the field because of this observation: the average inter-origin distance measured by single molecule techniques like molecular combing is ~100 kb, but the average inter-origin distance measure on a population of cells (same cell line) is ~30 kb. The only explanation is that in a population of cells many origins can fire, but in a given cell on a given allele, only one-third of those possible origins fire. This is why we did not worry about the lack of reproducibility between cell-lines, labs etc, but instead focused on those SNS-seq origins that are reproducible over multiple techniques and cell lines.

      Idea 3) A third explanation is that MCMs are loaded further from origin sites in human than in yeast. Is there any evidence of this? How far away does the evidence suggest, and what if this distance is used to define proximity?

      MCMs, of course, have to be loaded at an origin at the time the origin fires because MCMs provide the core of the helicase that starts unwinding the DNA at the origin. Thus, the lack of proximity of MCM binding sites with origins can be because the most detected MCM sites (where MCM spends the most time in a cell-population) does not correspond to where it is first active to initiate origin firing. This has been discussed. MCMs may be loaded far from origin site, but because of their ability to move along the chromatin, they have to move to the origin-site at some point to fire the origin.

      Idea 4) How many individual datasets (i.e., those collected and published together) also demonstrate the feature that ORC/MCM binding locations do not correlate with origins? If there are few, then indeed, the integrative analysis performed here is consistent. But if there are many, then why would individual datasets reveal one thing, but integrative analysis reveal something else?

      We apologize for this oversight. In the revised manuscript we will discuss PMC3530669, PMC7993996, PMC5389698, PMC10366126. None of them have addressed what we are addressing, which is whether the small subset of the most reproducible origins proximal to ORC or MCM binding sites, but the discussion is essential.

      Idea 5) What if you were much more restrictive when defining "high-confidence" origins / binding sites. Does the overlap between origins and binding sites go up with increasing restriction?

      We will make origins more restrictive by selecting those reproduced by 30-60 datasets. The number of origins will of course fall, but we will measure whether the proximity to ORC or MCM-binding sites increases/decreases in a statistically rigorous way.

      Overall, I have the sense that these experimental techniques may be producing a lot of junk. If true, this would be useful for the field to know! But if not, and there are indeed "unexplored mechanisms of origin specification" that would be exciting. But I'm not convinced yet.

      It would be nice in the Discussion for the authors to comment about the trade-offs of different techniques; what are their pros and cons, which should be used when, which should be avoided altogether, and why? This would be a valuable prescription for the field.

      Thanks for the suggestion. We will do what the reviewer suggests: use cell type-specific data wherever origins have been defined by at least two methods in the same cell type, specifically reporting the percent of shared origins amongst the datasets to compare whether some methods correlate better with each other. ORC ChIP-seq and MCM ChIP-seq data do not define origins: they define the binding sites of these proteins. Thus we will discuss why the ChIP-seq sites of these protein complexes should not be used to define origins.

      Reviewer #3 (Public Review):

      Summary: The authors present a thought-provoking and comprehensive re-analysis of previously published human cell genomics data that seeks to understand the relationship between the sites where the Origin Recognition Complex (ORC) binds chromatin, where the replicative helicase (Mcm2-7) is situated on chromatin, and where DNA replication actually beings (origins). The view that these should coincide is influenced by studies in yeast where ORC binds site-specifically to dedicated nucleosome-free origins where Mcm2-7 can be loaded and remains stably positioned for subsequent replication initiation. However, this is most certainly not the case in metazoans where it has already been reported that chromatin bindings sites of ORC, Mcm2-7, and origins do not necessarily overlap, likely because ORC loads the helicase in transcriptionally active regions of the genome and, since Mcm2-7 retains linear mobility (i.e., it can slide), it is displaced from its original position by other chromatin-contextualized processes (for example, see Gros et al., 2015 Mol Cell, Powell et al., 2015 EMBO J, Miotto et al., 2016 PNAS, and Prioleau et al., 2016 G&D amongst others). This study reaches a very similar conclusion: in short, they find a high degree of discordance between ORC, Mcm2-7, and origin positions in human cells.

      Strengths: The strength of this work is its comprehensive and unbiased analysis of all relevant genomics datasets. To my knowledge, this is the first attempt to integrate these observations and the analyses employed were suited for the questions under consideration.

      Thank you for recognizing the comprehensive and unbiased nature of our analysis. The fact that the major weakness is that the comprehensive view fails to move the field forward, is actually a strength. It should be viewed in the light that we cannot even find evidence to support the primary hypothesis: that the most reproducible origins must be near ORC and MCM binding sites. This finding will prevent the unwise adoption of ORC or MCM binding sites as surrogate markers of origins and may perhaps stimulate the field to try and improve methods of identifying ORC or MCM binding until the binding sites are found to be proximal to the most reproducible origins. The last possibility is that there are ORC- or MCM-independent modes of defining origins, but we have no evidence of that.

      Weaknesses: The major weakness of this paper is that this comprehensive view failed to move the field forward from what was already known. Further, a substantial body of relevant prior genomics literature on the subject was neither cited nor discussed. This omission is important given that this group reaches very similar conclusions as studies published a number of years ago. Further, their study seems to present a unique opportunity to evaluate and shape our confidence in the different genomics techniques compared in this study. This, however, was also not discussed.

      We will do what the reviewer suggests: use cell type-specific data wherever origins have been defined by at least two methods in the same cell type, specifically reporting the percent of shared origins amongst the datasets to compare whether some methods correlate better with each other. Thanks for the suggestion. ORC ChIP-seq and MCM ChIP-seq data do not define origins: they define the binding sites of these proteins. Thus, we will discuss why the ChIP-seq sites of these protein complexes should not be used to define origins.

      We do not cite the SNS-seq data before 2018 because of the concerns discussed above about the earlier techniques needing improvement. We will discuss other genomics data that we failed to discuss.

      We will cite the papers the reviewer names:

      Gros, Mol Cell 2015 and Powell, EMBO J. 2015 discuss the movement of MCM2-7 away from ORC in yeast and fliesand will be cited. MCM2-7 binding to sites away from ORC and being loaded in vast excess of ORC was reported earlier on Xenopus chromatin in PMC193934, and will also be cited.

      Miotto, PNAS, 2016: publishes ORC2 ChIP-seq sites in HeLa (data we have used in our analysis), but do not measure ORC1 ChIP-seq sites. They say: “ORC1 and ORC2 recognize similar chromatin states and hence are likely to have similar binding profiles.” This is a conclusion based on the fact that the ChIP seq sites in the two studies are in areas with open chromatin, it is not a direct comparison of binding sites of the two proteins.

      Prioleau, G&D, 2016: This is a review that compared different techniques of origin identification but has no primary data to say that ORC and MCM binding sites overlap with the most reproducible origins.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study investigates the context-specificity of facial expressions in three species of macaques to test predictions for the 'social complexity hypothesis for communicative complexity'. This hypothesis has garnered much attention in recent years. A proper test of this hypothesis requires clear definitions of 'communicative complexity' and 'social complexity'. Importantly, these two facets of a society must not be derived from the same data because otherwise, any link between the two would be trivial. For instance, if social complexity is derived from the types of interactions individuals have, and different types of signals accompany these interactions, we would not learn anything from a correlation between social and communicative complexity, as both stem from the same data.

      The authors of the present paper make a big step forward in operationalising communicative complexity. They used the Facial Action Coding System to code a large number of facial expressions in macaques. This system allows decomposing facial expressions into different action units, such as 'upper lid raiser', 'upper lip raiser' etc.; these units are closely linked to activating specific muscles or muscle groups. Based on these data, the authors calculated three measures derived from information theory: entropy, specificity and prediction error. These parts of the analysis will be useful for future studies.

      The three species of macaque varied in these three dimensions. In terms of entropy, there were differences with regard to context (and if there are these context-specific differences, then why pool the data?). Barbary and Tonkean macaques showed lower specificity than rhesus macaques. Regarding predicting context from the facial signals, a random forest classifier yielded the highest prediction values for rhesus monkeys. These results align with an earlier study by Preuschoft and van Schaik (2000), who found that less despotic species have greater variability in facial expressions and usage.

      Crucially, the three species under study are also known to vary in terms of their social tolerance. According to the highly influential framework proposed by Bernard Thierry, the members of the genus Macaca fall along a graded continuum from despotic (grade 1) to highly tolerant (grade 4). The three species chosen for the present study represent grade 1 (rhesus monkeys), grade 3 (Barbary macaques), and grade 4 (Tonkean macaques).

      The authors of the present paper define social complexity as equivalent to social tolerance - but how is social tolerance defined? Thierry used aggression and conflict resolution patterns to classify the different macaque species, with the steepness of the rank hierarchy and the degree of nepotism (kin bias) being essential. However, aggression and conflict resolution are accompanied by facial gestures. Thus, the authors are looking at two sides of the same coin when investigating the link between social complexity (as defined by the authors) and communicative complexity. Therefore, I am not convinced that this study makes a significant advance in testing the social complexity for communicative complexity hypothesis. A further weakness is that - despite the careful analysis - only three species were considered; thus, the effective sample size is very small.

      Social tolerance in macaques is defined by various covarying traits, among which rates of counter-aggression and conflict resolution are only two of many included (see Thierry 2021 for a recent discussion and review). We do not deviate from Thierry’s definition of social tolerance. We simply highlight that the constellation of behavioral traits in the most tolerant macaque species results in a social environment where the outcome of social interactions is more uncertain (see introduction lines 102-114). As we argue throughout the paper, higher uncertainty can be used as a proxy for higher complexity and thus we conclude that the most tolerant macaque species have the highest social complexity. While most social behavior in macaques is accompanied by some facial behavior, we were careful to define social contexts only from the body language/behavior (e.g., lunge for aggression, grooming for affiliation) of the individuals involved and ignored the facial behavior used (see method lines 371-381). Therefore, the facial behavior of macaques (communication signals) was not used in defining either social tolerance (and by extension complexity) or the social context in which it was used. We feel like this appropriately minimizes any elements of circularity in the analysis of social and communicative complexity.

      Regarding the effective sample size of three species, we agree that it is small, and it is a limitation of this study. However, the methodology we used is applicable to any species for which FACS is available (including other non-human primates, dogs, and horses), and therefore, we hope that other datasets will complement ours in the future. Nevertheless, we now acknowledge this limitation in the discussion (lines 314317).

      Reviewer #2 (Public Review):

      This is a well-written manuscript about a strong comparative study of diversity of facial movements in three macaque species to test arguments about social complexity influencing communicative complexity. My major criticism has to do with the lack of any reporting of inter-observer reliability statistics - see comment below. Reporting high levels of inter-observer reliability is crucial for making clear the authors have minimized chances of possible observer biases in a study like this, where it is not possible to code the data blind with regard to comparison group. My other comments and questions follow by line number:

      We agree that inter-observer coding reliability is an important piece of information. We now report in more detail the inter-observer reliability tests that we conducted on lines 384-392.

      38-40. Whereas I am an advocate of this hypothesis and have tested it myself, the authors should probably comment here, or later in the discussion, about the reverse argument - greater communicative complexity (driven by other selection pressures) could make more complicated social structures possible. This latter view was the one advocated by McComb & Semple in their foundational 2005 Biology Letters comparative study of relationships between vocal repertoire size and typical group size in non-human primate species.

      It is true that an increase in communicative complexity could allow/drive an increase in social complexity. Unfortunately our data is correlational in nature and we cannot determine the direction of causality. We added such a statement to the discussion (lines 311-314).

      72-84 and 95-96. In the paragraph here, the authors outline an argument about increasing uncertainty / entropy mapping on to increasing complexity in a system (social or communicative). In lines 95-96, though, they fall back on the standard argument about complex systems having intermediate levels of uncertainty (complete uncertainty roughly = random and complete certainty roughly = simple). Various authors have put forward what I think are useful ways of thinking about complexity in groups - from the perspective of an insider (i.e., a group member, where greater randomness is, in fact, greater complexity) vs from the perspective of an outside (i.e., a researcher trying to quantify the complexity of the system where is it relatively easy to explain a completely predictable or completely random system but harder to do so for an intermediately ordered or random system). This sort of argument (Andrew Whiten had an early paper that made this argument) might be worth raising here or later in the discussion? (I'm also curious where the authors sentiments lie for this question - they seem to touch on it in lines 285-287, but I think it's worth unpacking a little more here!)

      In this study we used three measures of uncertainty (entropy, context specificity, and prediction error) to approximate complexity. However, maximum entropy or uncertainty would be achieved in a system that is completely random (and thus be considered simple). Therefore, the species with the highest entropy values, or unpredictability, could be interpreted as having a simpler communication system than a species with a moderately high entropy/unpredictability value. Our argument is that animal communication systems cannot possibly be random, otherwise they would not have evolved as signals. In systems where we know the highest entropy (or unpredictability) will not be due to randomness, as is the case with animal social interactions and communication, we can conclude that the system with the highest uncertainty is the most complex. We have now expanded upon this point in the discussion (lines 286-294). See also response to reviewer 1 below.

      115-129. See also:

      Maestripieri, D. (2005). "Gestural communication in three species of macaques (Macaca mulatta, M. nemestrina, M. arctoides): use of signals in relation to dominance and social context." Gesture 5: 57-73.

      Maestripieri, D. and K. Wallen (1997). "Affiliative and submissive communication in rhesus macaques." Primates 38(2): 127-138.

      On that note, it is probably worth discussing in this paragraph and probably later in the discussion exactly how this study differs from these earlier studies of Maestripieri. I think the fact that machine learning approaches had the most difficulty assigning crested data to context is an important methodological advance for addressing these sorts of questions - there are probably other important differences between the authors' study here and these older publications that are worth bringing up.

      Our study differs from these two studies in that the studies above classified facial behavior into discrete categories (e.g., bared-teeth, lip-smack), whereas we adopted a bottom-up approach and made no a priori assumptions about which movements are relevant. We broke down facial behavior down to their individual muscle movements (i.e., Action Units). Measuring facial behavior at the level of individual muscle movements allows for a more detailed and objective description of the complexity of facial behavior. This is a general point in advancing the study of facial behavior that is discussed in the introduction (lines 60-71) and discussion (lines 206-208). The reason we don’t draw a direct comparison with the studies above is because they had a slightly different focus. Our study was more focused on complexity of the (facial) communication system in general rather than comparing whether the different species use the same facial behavior in the same/different social contexts.

      220-222. What is known about visual perception in these species? Recent arguments suggest that more socially complex species should have more sensitive perceptual processing abilities for other individuals' signals and cues (see Freeberg et al. 2019 Animal Behaviour). Are there any published empirical data to this effect, ideally from the visual domain but perhaps from any domain?

      This is an interesting point. We are not aware of any studies showing differences in visual perceptions within the macaque genus. Both crested macaques and rhesus macaques are able to discriminate between individuals and facial expressions in match-to-sample tasks with comparable performances (Micheletta et al., 2015a, 2015b; Parr et al. 2008; Parr & Heinz, 2009). Similarly, several macaque species are sensitive to gaze shifts from conspecifics (Tomasello et al. 1998; Teufel et al. 2010; Micheletta & Waller, 2012).

      274-277. I am not sure I follow this - could not different social and non-social contexts produce variation in different affective states such that "emotion"-based signals could be as flexible / uncertain as seemingly volitional / information-based / referential-like signals? This issue is probably too far away from the main points of this paper, but I suspect the authors' argument in this sentence is too simplified or overstated with regard to more affect-based signals.

      Emotion-based signals could, in theory, also produce flexible signals and it is possible that some facial expressions reflect an emotional state. However, some previous studies have suggested that facial expressions are only used as a display of emotion, rather than such signals having evolved for a different function such as announcing future intentions. In our study we found that macaques used, in some cases, the same facial expressions (i.e. combination of Action Units) in at least two different social contexts that, presumably, differed in their emotional valence. Thus, it is unlikely that particular facial expressions are bound to a single emotion. We think that this is an important point to make even though it is slightly beyond the scope of our paper.

      288 on. Given there are only three species in this study, the chances of one of the species being the 'most complex' in any measure is 0.33. Although I do not believe this argument I am making here, can the authors rule out the possibility that their findings related to crested macaques are all related to chance, statistically speaking?

      We are not aware of a way to rule out this possibility. However, we believe that we are appropriately cautious throughout the paper and acknowledge that having only investigated three species is a limitation of this study in the discussion (lines 314-317, see also our response to reviewer 1 above).

      329-330. The fact that only one male rhesus macaque was assessed here seems problematic, given the balance of sexes in the other two species. Can the authors comment more on this - are the gestures they are studying here identical across the sexes?

      We agree it would have been preferable to collect data on more than one male rhesus macaque, but that was unfortunately not possible. We are not aware of any studies showing differences in the use of facial behavior between male and female rhesus macaques. If differences exist, most likely these would occur in a sexual/mating context. However, in our study we only considered affiliative (non-sexual), submissive, and aggressive contexts, where we have no a priori reason to believe that there are sex differences.

      354-371. Inter-observer reliability statistics are required here - one of the authors who did not code the original data set, or a trained observer who is not an author, could easily code a subset of the video files to obtain inter-observer reliability data. This is important for ruling out potential unconscious observer biases in coding the data.

      We agree this is an important piece of information. We now report in more detail the inter-observer reliability tests that we conducted on lines 384-392:

      “An agreement rating of >0.7 was considered good [Ekman et al 2002] and was necessary for obtaining certification. To obtain a MaqFACS coding certification, AVR, CP, and PRC coded 23 video clips of rhesus macaques and the MaqFACS codes were compared to the data of other certified coders (https://animalfacs.com).

      The mean agreement ratings obtained were 0.85, 0.73, 0.83 for AVR, CP, and PRC, respectively. In addition, AVR and CP coded 7 videos of Barbary macaques with a mean agreement rating of 0.79. AVR and PRC coded 10 videos of crested macaques with a mean agreement rating of 0.74.”

      Reviewer #1 (Recommendations For The Authors):

      Given the long debate on the concept of information exchange in animal communication, I would also recommend being more careful with the term 'exchanges of information' (line 271). Perhaps it's better to be agnostic in the context of this paper.

      As suggested, we now changed the phrasing to focus on the behavior of the animals, rather than suggesting that information is being exchanged (lines 270-273),

      Line 281: "This result confirms the assumption that facial behaviour in macaques is not used randomly": the authors are knocking down a straw man. Nobody who has ever studied animal communication would consider that signals occur randomly. Otherwise, they would not have evolved as signals.

      Indeed, nobody claims that animal communication signals are used randomly. Although it may be taken for granted, we feel it is worthwhile to reiterate this point, given that we used relative entropy and prediction error as measures of complexity. For instance, maximum entropy or unpredictability would be achieved in a system that is completely random (and thus be considered simple). Therefore, the species with the highest entropy values, or lowest predictability, could be interpreted as having a simpler communication system than a species with a moderately high entropy value. But if we are working under the assumption that animal communication systems cannot possibly be random, then we can conclude that the species whose communication system has the highest entropy is in fact the most complex. We tried to make this justification clearer in the discussion (lines 285-294).

      I did not follow why there is a higher reliance on facial signals when predation pressure is higher. Apart from the fact that the authors cannot address this question, they may want to reconsider this idea altogether.

      We now expand on the logic of why predation pressure might affect the use of facial signals (see lines 308-309): “When predation pressure is higher, reliance on facial signals could be higher than, for example vocal signals, such as to not draw attention of predators to the signaller.”

      Technical comments:

      One methodological issue that requires clarification is what the units of analysis are. The authors write that each row in their analysis denoted an observation time of 500 ms. How many rows did the authors assemble? The authors mention a sample size of > 3000 social interactions in the abstract. How did they define social interactions? And how many 'time windows' of 500 ms were obtained? Did they take one window per interaction or several? If several, then how was this move accounted for in the analysis? The reporting needs to be more accurate here. Most likely, the bootstrapping took care of biases in the data, but still, this information needs to be provided.

      We have now added some additional information to the method section. Social interactions for each context had the following definitions: “Social context was labeled from the point of view of the signaler based on their general behavior and body language (but not the facial behavior itself), during or immediately following the facial behavior. An aggressive context was considered when the signaler lunged or leaned forward with the body or head, charged, chased, or physically hit the interaction partner. A submissive context was considered when the signaler leaned back with the body or head, moved away, or fled from the interaction partner. An affiliative context was considered when the signaler approached another individual without aggression (as defined previously) and remained in proximity, in relaxed body contact, or groomed either during or immediately after the facial behavior. In cases where the behavior of the signaler did not match our context definitions, or displayed behaviors belonging to multiple contexts, we labeled the social context as unclear. Social context was determined from the video itself and/or from the matching focal behavioral data, if available.” (lines 371-382). The total duration of all social interactions per social context, and thus the number of 500ms windows/rows, have been added to Table 1 (lines 395-397). There were several 500ms windows per social interaction. All 500ms time blocks per interaction were used in the statistical analyses in order to retain all the variation and complexity of the facial behavior (Action Unit combinations) used by the macaques (lines 403-405). Indeed the bootstrapping procedure was used to account for any biases in the data.

      Overall, I would recommend providing more information on the actual behaviour of the animals. The paper is strong in handling highly derived indices representing the behaviour, but the reader learns little about the animals' behaviour. Thus, it would be great if statements about the entropy ratio were translated into what these measures represent in real life. For context specificity, this is clear, but for entropy, not so much.

      A high entropy ratio essentially suggests that a species uses a high variety of unique facial behavior/signals and all signals in the repertoire are used roughly equally often (rather than one facial behavior being used 90% of the time and others rarely used). We have tried our best to better explain this point in the introduction (lines 75-81) and discussion (lines 215-222). Discussing exactly what these signals are and what they mean was beyond the scope of this paper.

      Line 106: nepotism, not kinship

      Changed as suggested (line 106).

      Line 113: I would avoid statements about how a monkey society is perceived by its members.

      We think that noting how individuals may perceive their social environment is worthwhile when defining social complexity, so have retained this point but changed the phrasing to be more speculative (lines 112-113).

      Line 329: I was very surprised that only one male was represented in the data for rhesus monkeys. The authors try to wriggle their way out of this issue in the supplementary material ("Therefore, we have no a priori reason to expect an overall difference in the diversity and complexity of facial behaviour between the sexes"), but I think this is a major shortcoming of the analysis. They should ascertain whether there are no sex differences in the other two species regarding their variables of interest. They could then make a very cautious case for there being no sex differences in rhesus either. But of course, they would not know for sure.

      As with our response to reviewer 2 above, we agree that it would have been preferable to collect data on more than one male rhesus macaque, but that was unfortunately not possible. We are not aware of any studies showing differences in the use of facial behavior between male and female rhesus macaques. If differences exist, most likely these would occur in a sexual/mating context. However, in our study we only considered affiliative (non-sexual), submissive, and aggressive contexts, where we have no a priori reason to believe that there are sex differences. Looking at sex differences in the use of facial behavior would be a worthwhile study on its own, but it is outside the scope of this paper.

      This paper would make a stronger contribution if it focussed on the comparative analysis of facial expressions and removed the attempt of testing the social complexity for communicative complexity hypothesis.

      A comparative analysis of the contextual use of specific facial movements is important. But this paper is focused on making a more general comparison of the communication style and complexity across species. The social complexity hypothesis for communicative complexity is one of the key theoretical frameworks for such an investigation and allows us to frame our study in a broader context. We contribute important data on 3 species with methods that can be replicated and extended to others species. Therefore, we believe that it is a worthy contribution to investigations of the evolution of complex communication.

      REFERENCES

      Micheletta, J., J. Whitehouse, L.A. Parr, and B.M. Waller. ‘Facial Expression Recognition in Crested Macaques (Macaca nigra)’. Animal Cognition 18 (2015): 985–90. https://doi.org/10/f7fvnh.

      Micheletta, Jérôme, Jamie Whitehouse, Lisa A. Parr, Paul Marshman, Antje Engelhardt, and Bridget M. Waller. ‘Familiar and Unfamiliar Face Recognition in Crested Macaques (Macaca nigra)’. Royal Society Open Science 2 (2015): 150109. https://doi.org/10/ggx9k9.

      Parr, L. A., and M. Heintz. ‘Facial Expression Recognition in Rhesus Monkeys, Macaca mulatta’. Animal Behaviour 77 (2009): 1507–13. https://doi.org/10/bbsp5n.

      Parr, L.A., M. Heintz, and G. Pradhan. ‘Rhesus Monkeys (Macaca mulatta) Lack Expertise in Face Processing’. Journal of Comparative Psychology 122 (2008): 390–402. https://doi.org/10/d7w6bv.

      Micheletta, J., and B.M. Waller. ‘Friendship Affects Gaze Following in a Tolerant Species of Macaque, Macaca nigra’. Animal Behaviour 83 (2012): 459–67. https://doi.org/10/c4f8n2.

      Thierry B. Where do we stand with the covariation framework in primate societies? Am. J. Biol. Anthropol. 128 (2021): 5–25. https://doi.org/10.1002/ajpa.24441

      Tomasello, M., J. Call, and B. Hare. ‘Five Primate Species Follow the Visual Gaze of Conspecifics’. Animal Behaviour 55 (1998): 1063–69. https://doi.org/10/bmq7xh.

      Teufel, C., A. Gutmann, R. Pirow, and J. Fischer. ‘Facial Expressions Modulate the Ontogenetic Trajectory of Gaze-Following among Monkeys’. Developmental Science 13 (2010): 913–22. https://doi.org/10/b6j5r7.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful for the helpful comments of both reviewers and have revised our manuscript with them in mind.

      One of the main issues raised was that readers may by default assume that our models are correct. We in fact made it very clear in our discussion that the models are merely hypotheses that will need testing by “wet” experiments and we do not therefore agree that even readers unfamiliar with AF would assume that the models must be correct. It was also suggested that readers could be reassured by including extensive confidence estimates such as PAE plots. As it happens, every single model described in the manuscript had reasonably high PAE scores and more crucially the entire collection of output files, including PAE data, are readily accessible on Figshare at https://doi.org/10.6084/m9.figshare.22567318.v2, a fact that the reviewers appear to have overlooked. The Figshare link is mentioned three times in the manuscript. Embedding these data within the manuscript itself would in our view add even more details and we have therefore not included them in our revised manuscript. Likewise, it is rather simple for any reader to work out which part of a PAE matrix corresponds to an interaction observed in the corresponding pdb prediction. Besides which, it is our view that the biological plausibility and explanatory power of models is just as important as AF metrics in judging whether they may be correct, as is indeed also the case for most experimental work.

      Another important point was that the manuscript was too long and not readable. Yes, it is long and it could well be argued that we could have written a different type of manuscript, focusing entirely on what is possibly the simplest and most important finding, namely that our AF models suggest that in animal cells Wapl appears to form a quarternary complex with SA, Pds5, and Scc1 in a manner suggesting that a key function of Wapl’s conserved CTD is to sequester Scc1’s Nterminal domain after it has dissociated from Smc3. For right or for wrong, we decided that this story could not be presented on its own but also required 1) an explanation for how Scc1 is induced to dissociate from Smc3 in the first place and 2) how to explain that the quarternary complex predicted for animal cells was not initially predicted for fungi such as yeast. The yeast situation was an exception that clearly needed explaining if the theory was to have any generality and it turned out that delving into the intricate details of the genetics of releasing activity in yeast was eventually required and yielded valuable new insights. We also believe that our work on the recruitment of Eco/Esco acetyl transferases to cohesin and the finding that sororin binds to the Smc3/Scc1 interface also provided important insight into how releasing activity is regulated. We acknowledge that the paper is indeed long but do not think that it is badly written. It is above all a long and complex story that in our view reveals numerous novel insights into how cohesin’s association with chromosomes is regulated and have endeavoured to eliminate any excessive speculation. We feel it is not our fault that cohesin uses complex mechanisms.

      Notwithstanding these considerations, we have in fact simplified a few sections and removed one or two others but acknowledge that we have not made substantial cuts.

      It was pointed out that a key feature of our modelling, namely the predicted association of Wapl’s C-terminal domain with SA/Scc3’s CES is inconsistent with published biochemical data. The AF predictions for this interface are universally robust in all eukaryotic lineages and crucially fully consistent with published and unimpeachable genetic data. We note that any model that explains all findings is bound to be wrong for the very simple reason that some of these findings will prove to be incorrect. There is therefore an art in Science of judging which data must be explained and accommodated and which should be ignored. In this particular case, we chose to ignore the biochemistry. Time will tell whether our judgement proves correct.

      Last but not least, it was suggested that we might provide some experimental support for our proposed SA/Scc3-Pds5-Scc1-WaplC quaternary complex. We are in fact working on this by introducing cysteine pairs (that can be crosslinked in cells) into the proposed interfaces but decided that such studies should be the topic of a subsequent publication. It would be impossible with the resources available to our labs to follow up all of the potential interactions and we therefore decided to exclude all such experiments.

      We are grateful for the detailed comments provided by both reviewers, many of which were very helpful, and in many but not all cases have amended the manuscript accordingly.

      With regard to the more specific comments:

      Reviewer #1 (Recommendations For The Authors):

      1) One concern is that observed interfaces/complexes arise because AF-multimer will aim to pack exposed, conserved and hydrophobic surfaces or regions that contain charge complementarity. The risk is that pairwise interaction screens can result in false positive & non-physiological interactions. It is therefore important to report the level of model confidence obtained for such AF calculations:

      A) The authors should color the key models according to pLDDT scores obtained as reported by AF. This would allow the reader to judge the estimated accuracy of the backbone and side chain rotamers obtained. At least for the key models and interactions it would be important to know if the pLDDT score is >90 (Correct backbone and most rotamers) or >70 (only backbone is correct).

      B) It would also be important to report the PAE plots to allow estimation of the expected position error for most of the important interactions. pLDDT coloring and PEA plots can be shown side-by-side as shown in other published data (e.g. https://pubmed.ncbi.nlm.nih.gov/35679397/ (Supplementary data)

      C) The authors should include a Table showing the confidence of template modeling scores for the predicted protein interfaces as ipTM, ipTM+pTM as reported by AlphaFold-multimer. Ideally, they would also include DockQ scores but this may not be essential. Addition of such scores would help classification into Incorrect, Acceptable or of high quality. For example, line 1073 et seq the authors show a model of a SCC1SA and ESCO1 complex (Fig. 37). Are the modeling scores for these interfaces high? It does not help that the authors show cartoons without side chains? Can the authors provide a close-up view of the two interfaces? Are the amino acids are indeed packed in a manner expected for a protein interface? Can we exclude the possibility that the prediction is obtained merely because the sequence segments (e.g. in ESCO1 & ESCO2) are hydrophobic and conserved?

      We do not agree that including this level of detail to the text/figures of the manuscript would be suitable. All the relevant data for those who may be sceptical about the models are readily available at https://doi.org/10.6084/m9.figshare.22567318.v2. In our view, the cartoon versions of the models are easier for a reader to navigate. Anyone interested in the molecular details can look at the models directly.

      Importantly, no amount of statistical analysis can completely validate these models. What is required are further experiments, which will be the topic of further work from our and I dare from other laboratories.

      D) When they predict an interaction between the SA2:SCC1 complex and Sororin's FGF motif, they find that only 1/5 models show an interaction and that the interaction is dissimilar to that seen of CTCF. Again, it would be helpful to know about modeling scores. Can they show a close-up view of the SORORIN FGF binding interface to see if a realistic binding mode is obtained? Can they indicate the relevant region on the PAE plot?

      Given that AF greatly favours other interactions of Sororin’s FGF motif over its interaction with SA2-Scc1, we do not agree that dwelling on the latter would serve any purpose.

      2) Line 996: AF predicts with high confidence an interaction between Eco1 & SMC3hd. What are the ipTM (& DockQ if available) scores. Would the interface score High, Medium or Acceptable?

      As mentioned, see https://doi.org/10.6084/m9.figshare.22567318.v2.

      3) Line 1034 et seq: Eco1/ESCO1/ESCO2 interaction with PDS5. Interface scores need to be shown to determine that the models shown are indeed likely to occur. If these interactions have low model confidence, Fig. 36 and discussion around potential relevance to PDS5-Eco1 orientation relative to the SMC3 head remains highly speculative and could be expunged.

      See https://doi.org/10.6084/m9.figshare.22567318.v2. It should be clear that the predictions are very similar in fungi and animals. Crucially, we know that Pds5 is essential for acetylation in vivo, so the models appear plausible from a biological point of view.

      4) Considering the relatively large interface between ECO1 and SMC3, would the author consider the possibility that in addition to acetylating SMC3's ATPase domain, ECO1 remains bound to cohesin-DNA complex, as proposed for ESCO1 by Rahman et al (10.1073/pnas.1505323112)?

      This is certainly possible but we would not want to indulge in such speculation.

      5) E.g. Line 875 but also throughout the text: As there is no labeling of the N- and C-termini in the Figures, is frequently unclear what the authors are referring to when they mention that AF models orient chains in a certain manner.

      Good point. This has been amended. However, the positions of N- and C- is all available at https://doi.org/10.6084/m9.figshare.22567318.v2.

      6) Fig19B: PAE plots: authors should indicate which chains correspond to A, B, C. Which segment corresponds to the TYxxxR[T/S]L motif? Can they highlight this section on the PAE plot?

      Good point and amended in the revised manuscript.

      Minor comments:

      1) Line 440: the WAPL YSR motif is not shown in Fig. 14A

      2) Line 691: Scc3 spelling error.

      3) Line 931: Sentence ending '... SCC3 (SCC3N).' requires citation.

      4) Line 1008: Figure reference seems wrong. It should read: Fig. 34A left and right. Fig. 34B does not contain SCC1.

      Many thanks for spotting these. Hopefully, all corrected.

      5) Fig. 41 can be removed as it shows the absence of the interaction of Sororin with SMC1:SCC1. Sufficient to mention in the text that Sororin does not appear to interact with SMC1:SCC1.

      This is possible but we decided to leave this as is.

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      (1) Are there any predicted models in which one of the two dimer interfaces of the hinge is open when the coiled coils are folded back, as seen in the cryo-EM structure of human cohesin-NIPBL complex in the clamped state?

      No AF runs ever predicted half opened hinges. It is possible that the introduction of mutations in one of the two interfaces might reveal a half-opened state and we ought to try this. However, it would not be appropriate for this manuscript, we believe.

      (2) Structures of the SA-Scc1 CES bound to [Y/F]xF motifs from Sgo1 and CTCF have been reported, suggesting that a similar motif could interact with SA/Scc3. Surprisingly, AF did not predict an interaction between Scc3/SA and Wapl FGF motifs, which only bind to the Pds5 WEST region. On the other hand, AF predicted interactions of the Sororin FGF motif with both Pds5 WEST and SA CES. Can the authors comment on this Wapl FGF binding specificity? What will happen if a Wapl fragment lacking the CTD is used in the prediction?

      This seems to be an academic point as the CTD is always present.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1) The authors need to validate that RAP1-HA still retains its essential function. As indicated above, if RAP1-HA still retains its essential functions, cells carrying one RAP1-HA allele and one deleted allele are expected to grow the same as WT cells. These cells should also have the WT VSG expression pattern, and RAP1-HA should still interact with TRF.

      We demonstrated that C-terminally HA-tagged RAP1 co-localizes with telomeres by a combination of immunofluorescence and fluorescence in situ hybridization (Cestari and Stuart, 2015, PNAS), and co-immunoprecipitate telomeric and 70 bp repeats (Cestari et al. 2019 Mol Cell Biol). We also showed by immunoprecipitation and mass spectrometry that HA-tagged RAP1 interacts with nuclear and telomeric proteins, including PIP5Pase (Cestari et al. 2019). Others have also tagged T. brucei RAP1 with HA without disrupting its nuclear localization (Yang et al. 2009, Cell), all of which indicate that the HA-tag does not affect protein function. As for the suggested experiment, there is no guarantee that cells lacking one allele of RAP1 will behave as wildtype, i.e., normal growth and repression of VSGs genes. Also, less than 90% of T. brucei TRF was reported to interact with RAP1 (Yang et al. 2009, Cell), which might be indirect via their binding to telomeric repeats rather than direct protein-protein interactions.

      2) The authors need to remove the His6 tag from the recombinant RAP1 fragments before the EMSA analysis. This is essential to avoid any artifacts generated by the His6-tagged proteins.

      Our controls show that the His-tag is not interfering with RAP1-DNA binding. We show in Fig 3CG by EMSA and in Fig S5 by EMSA and microscale thermophoresis that His-tagged full-length rRAP1 does not bind to scrambled telomeric dsDNA sequences, which demonstrates that His-tagged rRAP1 does not bind unspecifically to DNA. Moreover, in Fig 3G and Fig S5, we show that His-tagged rRAP11-300 also does not bind to 70 bp or telomeric repeats. In contrast, the full-length His-tagged rRAP1, rRAP1301-560, or rRAP1561-855 bind to 70 bp or telomeric repeats (Fig 3C-G). Since all proteins were His-tagged, the His tag cannot be responsible for the DNA binding. We have worked with many different His-tagged proteins for nucleic acid binding and enzymatic assays without any interference from the tag (Cestari and Stuart, 2013; JBC; Cestari et al; 2013, Mol Cell Biol; Cestari and Stuart, 2015, PNAS; Cestari et al. 2016; Cell Chem Biol; Cestari et al. 2019 Mol Biol Cell).

      3) More details need to be provided for ChIPseq and RNAseq analysis regarding the read numbers per sample, mapping quality, etc.

      Table S3 includes information on sequencing throughput and read length. Mapping quality was included in the Methods section “Computational analysis of RNA-seq and ChIP-seq”, starting at line 499. In summary, we filtered reads to keep primary alignment (eliminate supplementary and secondary alignments). We also analyzed ChIP-seq with MAPQ ≥20 (99% probability of correct alignment) to distinguish RAP1 binding to specific ESs, including silent vs active ES (ChIP-seq). We included Fig S4 to show the effect of filtering alignments on the active vs silent ESs. We used MAPQ ≥30 to analyze RNA-seq mapping to VSG genes, including those in subtelomeric regions. Our scripts are available at https://github.com/cestari-lab/lab_scripts. We also included in the Methods, lines 522-524: “Scripts used for ChIP-seq, RNA-seq, and VSG-seq analysis are available at https://github.com/cestari-lab/lab_scripts. A specific pipeline was developed for clonal VSG-seq analysis, available at https://github.com/cestarilab/VSG-Bar-seq.”

      4) The authors should revise the Discussion section to clearly state the authors' speculations and their working models (the latter of which need solid supporting evidence). Specifically, statements in lines 218 - 219 and lines 224-226 need to be revised.

      The statement “likely due to RAP1 conformational changes” in line 228 discusses how binding of PI(3,4,5)3 could affect RAP1 Myb and MybL domains binding to DNA. We did not make a strong statement but discussed a possibility. We believe that it is beneficial to the reader to have the data discussed, and we do not feel this point is overly speculative. For lines 224-226 (now 234-235), the statement refers to the finding of RAP1 binding to centromeric regions by ChIP-seq, which is a new finding but not the focus of this work. To make it clear that it does not refer to telomeric ESs, we edited: “The finding of RAP1 binding to subtelomeric regions other than ESs, including centromeres, requires further validation.” Since RAP1 binding to centromeres is not the focus of the work, future studies are necessary to follow up, and we believe it is appropriate in the Discussion to be upfront and highlight this point to the readers.

      Our model is based on the data presented here but also on scientific literature. We have reviewed the Discussion to prevent broad speculations. When discussing a model, we stated (line 245): “The scenario suggests a model in which …”, to state that this is a working model. Similarly, in Results (line 201) we included: “Our data suggest a model in which…”.

      5) The authors should revise the title to reflect a more reasonable conclusion of the study.

      We agree that the title should be changed to imply a direct role of PI(3,4,5)P3 regulation of RAP1, which is not captured in the original title. This will provide more specific information to the readers, especially those broadly interested in telomeric gene regulation and RAP1. The new title is: PI(3,4,5)P3 allosteric regulation of repressor activator protein 1 controls antigenic variation in trypanosomes

      6) The authors are recommended to provide an estimation of the expression level of the V5-tagged PIP5pase from the tubulin array in reference to the endogenous protein level.

      The relative mRNA levels of the exclusive expression of PIP5Pase mutant compared to the wildtype is available in the Data S1, RNA-seq. The Mut PIP5Pase allele’s relative expression level is 0.85fold to the WT allele (both from tubulin loci). We also showed by Western blot the WT and Mut PIP5Pase protein expression (Cestari et al. 2019, Mol Cell Biol). Concerning PIP5Pase endogenous alleles, we compared normalized RNA-seq counts per million from the conditional null PIP5Pase cells exclusively expressing WT or the Mut PIP5Pase alleles (Data S1, this work) to our previous RNA-seq of single-marker 427 strain (Cestari et al. 2019, Mol Cell Biol). We used the single-maker 427 because the conditional null cells were generated in this strain background. The PIP5Pase WT and Mut mRNAs expressed from tubulin loci are 1.6 and 1.3-fold the endogenous PIP5Pase levels in single-marker 427, respectively. We included a statement in the Methods, lines 275-278: “The WT or Mut PIP5Pase mRNAs exclusively expressed from tubulin loci are 1.6 and 1.3-fold the WT PIP5Pase mRNA levels expressed from endogenous alleles in the single marker 427 strain. The fold-changes were calculated from RNA-seq counts per million from this work (WT and Mut PIP5Pase, Data S1) and our previous RNA-seq from single marker 427 strain (24).”

      7) The authors are recommended to provide more detailed EMSA conditions such as protein and substrate concentrations. Better quality EMSA gels are preferred.

      All concentrations were already provided in the Methods section. See line 356, in topic Electrophoretic mobility shift assays: “100 nM of annealed DNA were mixed with 1 μg of recombinant protein…”. For microscale thermophoresis, also see lines 375-376 in topic Microscale thermophoresis binding kinetics: “1 μM rRAP1 was diluted in 16 two-fold serial dilutions in 250 mM HEPES pH 7.4, 25 mM MgCl2, 500 mM NaCl, and 0.25% (v/v) N P-40 and incubated with 20 nM telomeric or 70 bp repeats…”. Note that two different biochemical approaches, EMSA and microscale thermophoresis, were used to assess rRAP1-His binding to DNA. Both show agreeable results (Fig 3 and 5, and Fig S5. Microscale thermophoresis shows the binding kinetics, data available in Table 1). The EMSA images clearly show the binding of RAP1 to 70 bp or telomeric repeats but not to scramble telomeric repeat DNA.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Figures

      All figures should have their axes properly labeled and units should be indicated. For many of the ChIPseq datasets it is not clear whether the authors show a fold enrichment or RPM and whether they used all reads or only uniquely mapping reads. Especially the latter is a very important piece of information when analyzing expression sites and should always be reported. The authors write, that all RNA-seq and ChIP-seq experiments were performed in triplicate. What is shown in the figures, one of the replicates? Or the average?

      ChIP-seq is shown as fold enrichment; we clarified this in the figures by including in the y-axis RAP1-HA ChIP/Input (log 2). We included in figure legends, see line 710: “Data show fold-change comparing ChIP vs Input.”. For quantitative graphs (Fig 2B, D, and E, and Fig 5F and G), data are shown as the mean of biological replicates. Graphs generated in the integrated genome viewer (IGV, qualitative graphs) is a representative data (Fig 2A, C, and F, and Fig 5D-E). All statistical analyses were calculated from the three biological replicates. Uniquely mapped reads were used. We also included ChIP-seq analysis with MAPQ ≥10 and 20 (90% and 99% probability of correct alignment, respectively) to distinguish RAP1 binding to ESs. Fig S4 shows the various mapping stringency and demonstrates the enrichment of RAP1-HA to silent vs active ES.

      Figure 1 is very important for the main argument of the manuscript, but very difficult (impossible for me) to fully understand. It would be great if the author could make an effort to clarify the figure and improve the labels. Panel Fig 1E. Here it is impossible to read the names of the genes that are activated and therefore it is impossible to verify the statements made about the activation of VSGs and the switching.

      We have edited Fig 1E to include the most abundant VSGs, which decreased the amount of information in the graph and increased the label font. We also re-labeled each VSG with chromosome or ES name and common VSG name when known (e.g., VSG2). We included Table S1 in the supplementary information with the data used to generate Fig 1E. In Table S1, the reader will be able to check the VSG gene IDs and evaluate the data in detail. We included in the legend, line 700: “See Table S1 for data and gene IDs of VSGs.”

      Figure 1F: This panel is important and should be shown in more detail as it distinguishes VSG switching from a general VSG de-repression phenotype. VSG-seq is performed in a clonal manner here after PIP5Pase KD and re-expression. To show that proper switching has occurred place in the different clones, instead of a persistent VSG de-repression, the expression level of more VSGs should be shown (e.g. as in panel E) to show that there is really only one VSG detected per clone. For example, it is not clear what the authors 'called' the dominant VSG gene.

      We showed in supplementary information Fig S1 B-C examples of reads mapping to the VSGs. Now we included a graph (Fig S1 D) that quantifies reads mapped to the VSG selected as expressed compared to other VSG genes considered not expressed). The data show an average of several clones analyzed. Other VSGs (not selected) are at the noise level (about 4 normalized counts) compared to >250 normalized counts to the selected as expressed VSGs.

      As mentioned in the public comments, I don't see how the data from Fig 1E and 1F fit together. Based on Fig 1E VSG2 is the dominant VSG, based on Fig 1F VSG2 is almost never the dominant VSG, but the VSG from BES 12.

      In Fig 1E, the VSG2 predominates in cells expressing WT PIP5Pase, however, in cells expressing Mut PIP5Pase, this is not the case anymore. Many other VSGs are detected, and other VSG mRNAs are more abundant than VSG2 (see color intensity in the heat map). The Mut cells may also have remaining VSG2 mRNAs (from before switching) rather than continuous VSG2 expression. This is the reason we performed the clonal analysis shown in Fig 1F, to be certain about the switching. While Fig 1F shows potential switchers in the population, Fig 1E confirms VSG switching in clones.

      Many potential switchers were detected in the VSG-seq (Fig 1F, the whole cell population is over 107 parasites), but not all potential switchers were detected in the clonal analysis because we analyzed 212 clones total, a fraction of the over 107 cells analyzed by VSG-seq (Fig 1E). Also, it is possible that not all potential switchers are viable. A preference for switching to specific ESs has been observed in T. brucei (Morrison et al. 2005, Int J Parasitol; Cestari and Stuart, 2015, PNAS), which may explain several clones switching to BES12.

      Note that in Fig 1F, tet + cells did not switch VSGs at all; all 118 clones expressed VSG2. We relabeled Fig 1F for clarity and included the VSG names. We added gene IDs in the Figure legends, see line 702 “ BES1_VSG2 (Tb427_000016000), BES12_VSG (Tb427_000008000)…”

      Statements in Introduction / Discussion

      The statement in lines 82/83 is very strong and gives the impression that the PIP5Pase-Rap1 circuit has been proven to regulate antigenic variation in the host. However, I don't think this is the case. The paper shows that the pathway can indeed turn expression sites on and off, but there is no evidence (yet) that this is what happens in the host and regulates antigenic variation during infection. The same goes for lines 214/215 in the discussion.

      We agree with the reviewer, and we edited these statements. The statement lines 82-83: “The data provide a molecular mechanism…” to “The data indicates a molecular mechanism…” For lines 224225: “and provides a mechanism to control…” to “and indicates a mechanism to control…”. We also included in lines 261-262: “It is unknown if a signaling system regulates antigenic variation in vivo.” Also edited lines 262-263: “…the data indicate that trypanosomes may have evolved a sophisticated mechanism to regulate antigenic variation...”.

      New vs old data

      In general, for Figures 1 - 4, it was a bit difficult to understand which panels showed new findings, and which panels confirmed previous findings (see below for specific examples). In the text and in the figure design, the new results should be clearly highlighted. Authors: All data presented is new, detailed below.

      Figure 1: A similar RNA-seq after PIP5Pase deletion was performed in citation 24. Perhaps the focus of this figure should be more on the (clone-specific) VSG-seq experiment after PIP5Pase re-introduction.

      This is the first time we show RNA-seq of T. brucei expressing catalytic inactive PIP5Pase, which establishes that the regulation of VSG expression and switching, and repression of subtelomeric regions, is dependent on PIP5Pase enzyme catalysis, i.e., PI(3,4,5)P3 dephosphorylation. Hence, the relevance and difference of the RNA-seq here vs the previous RNA-seq of PIP5Pase knockdown.

      Figure 2: A similar ChIP-seq of RAP1 was performed in citation 24, with and without PIP5Pase deletion. Could new findings be highlighted more clearly?

      Our and others’ previous work showed ChIP-qPCR, which analyses specific loci. Here we performed ChIP-seq, which shows genome-wide binding sites of RAP1, and new findings are shown here, including binding sites in the BES, MESs, and other genome loci such as centromeres. We also identified DNA sequence bias defining RAP1 binding sites (Fig 2A). We also show by ChIP-seq how RAP1-binding to these loci changes upon expression of catalytic inactive PIP5Pase. To improve clarity in the manuscript, we edited lines 129-130: “We showed that RAP1 binds telomeric or 70 bp repeats (24), but it is unknown if it binds to other ES sequences or genomic loci.”

      Figure 4: Binding of Rap1 to PI(3,4,5)P3, but not to other similar molecules, was previously shown in citation 24. Could new findings be highlighted more clearly?

      We published in reference 24 (Cestari et al. Mol Cell Biol) that RAP1-HA can bind agarose beadsconjugated synthetic PI(3,4,5)P3. Here, we were able to measure T. brucei endogenous PI(3,4,5)P3 associated with RAP1-HA (Fig 4F). Moreover, we showed that the endogenous RAP1-HA and PI(3,4,5)P3 binding is about 100-fold higher when PIP5Pase is catalytic inactive than WT PIP5Pase. The data establish that in vivo endogenous PI(3,4,5)P3 binds to RAP1-HA and how the binding changes in cells expressing mutant PIP5Pase; this data is new and relevant to our conclusions. To clarify, we edited the manuscript in lines 180-182: “To determine if RAP1 binds to PI(3,4,5)P3 in vivo, we in-situ HA-tagged RAP1 in cells that express the WT or Mut PIP5Pase and analyzed endogenous PI(3,4,5)P3 levels associated with immunoprecipitated RAP1-HA”.

      Sequencing.<br /> I really appreciate the amount of detail the authors provide in the methods section. The authors do an excellent job of describing how different experiments were performed. However, it would be important that the authors also provide the basic statistics on the sequencing data. How many sequencing reads were generated per run (each replicate of the ChIP-seq and RNA-seq assays)? How long were the reads? How many reads could be aligned?

      The sequencing metrics for RNA-seq and ChIP-seq for all biological replicates were included in Table S3 (supplementary information). The details of the analysis and sequencing quality were described in the Methods section “Computational analysis of RNA-seq and ChIP-seq”. To be clearer about the analysis, we also included in Methods, lines 522-524: “Scripts used for ChIP-seq, RNA-seq, and VSG-seq analysis are available at https://github.com/cestari-lab/lab_scripts. A specific pipeline was developed for clonal VSG-seq analysis, available at https://github.com/cestari-lab/VSG-Bar-seq.”.

      Minor comments:

      Figure 1B: I would recommend highlighting the non-ES VSGs and housekeeping genes with two more colors in the volcano plot, to show that it is mostly the antigen repertoire that is deregulated, and not the Pol ll transcribed housekeeping genes. This is not entirely clear from the panel as it is right now.

      The suggestion was incorporated in Fig 1B. We color-coded the figure to include BES VSGs, MES VSGs, ESAGs, subtelomeric genes, core genes (typically Pol II and Pol III transcribed genes), and Unitig genes, those genes not assembled in the 427-2018 reference genome.

      Were the reads in Figure 2a filtered in the same way as those in Figure 2C? To support the statements, only unique reads should be used.

      Yes, we also added Fig S4 to make more clear the comparison between read mapping to silent vs active ES.

      It would be good if the authors could add a supplementary figure showing the RAP1 ChIP-seq (WT and cells lacking a functional PIP5Pase) for all silent expression sites.

      We had RAP1 ChIP-seq from cells expressing WT PIP5Pase already. We have it modified to include data from the Mutant PIP5Pase. See Fig S3 and S5.

      In Figure 5D, after depletion of PIP5Pase, RAP1 binding appears to decrease across ESAGs, but ESAG expression appears to increase. How can this be explained with the model of RAP1 repressing transcription?

      We included in the Results, lines 208-212: “The increased level of VSG and ESAG mRNAs detected in cells expressing Mut PIP5Pase (Fig 5D) may reflect increased Pol I transcription. It is possible that the low levels of RAP1-HA at the 50 bp repeats affect Pol I accessibility to the BES promoter; alternatively, RAP1 association to telomeric or 70 bp repeats may affect chromatin compaction or folding impairing VSG and ESAG genes transcription.”.

      Reviewer #3 (Recommendations For The Authors):

      Line 114 - typo? Procyclic instead of procyclics:

      Fixed, thanks.

      Line 233 - the phrasing here is confusing, may want to replace "whose" with "which" (if I am interpreting correctly):

      Thanks, no changes were needed. I have had the sentence reviewed by a Ph.D.-level scientific writer.

      Methods - there is no description of VSG-seq analysis in the methods. Is it done the same way as the RNA-seq analysis? Is the code for analysis/generating figures available online?

      The procedure is similar. We included an explanation in Methods, lines 503-504: “RNA-seq and VSG-seq (including clonal VSG-seq) mapped reads were quantified…”. Also, in lines 522-54: “Scripts used for ChIP-seq, RNA-seq, and VSG-seq analysis are available at https://github.com/cestari-lab/lab_scripts. A specific pipeline was developed for clonal VSG-seq analysis, available at https://github.com/cestarilab/VSG-Bar-seq.”.

      Fig 1H - Is this from RNA-seq or VSG-seq analysis of procyclics?

      The procyclic forms VSG expression analysis was done by real-time PCR. To clarify it, we included it in the legend “Expression analysis of ES VSG genes after knockdown of PIP5Pase in procyclic forms by real-time PCR”. We also amended the Methods, under the topic RNA-seq and real-time PCR, line 402-407: “For procyclic forms, total RNAs were extracted from 5.0x108 T. brucei CN PIP5Pase growing in Tet + (0.5 µg/mL, no knockdown) or Tet – (knockdown) at 5h, 11h, 24h, 48h, and 72h using TRIzol (Thermo Fisher Scientific) according to manufacturer's instructions. The isolated mRNA samples were used to synthesize cDNA using ProtoScript II Reverse Transcriptase (New England Biolabs) according to the manufacturer's instructions. Real-time PCRs were performed using VSG primers as previously described (23).”

      Fig 2 A - Where it says "downstream VSG genes" I assume "downstream of VSG genes" is meant? the regions described in this figure might be more clearly laid out in the text or the legend

      Fixed, thanks. We included in the text in Results, line 140: “… and Ts and G/Ts rich sequences downstream of VSG genes”.

      Fig 2E - what does "Flanking VSGs" mean in this context?

      We added to line 705, figure legends: “Flanking VSGs, DNA sequences upstream or downstream of VSG genes in MESs. “

      Fig 2H - Why is the PIP5Pase Mutant excluded from the Chr_1 core visualization?

      We did not notice it. We included it now; thanks.

    1. We were not many days in the merchant’s custody before we were sold after their usual manner, which is this:—On a signal given, (as the beat of a drum), the buyers rush at once into the yard where the slaves are confined, and make choice of that parcel they like best. The noise and clamour with which this is attended, and the eagerness visible in the countenances of the buyers, serve not a little to increase the apprehension of the terrified Africans, who may well be supposed to consider them as the ministers of that destruction to which they think themselves devoted. In this manner, without scruple, are relations and friends separated, most of them never to see each other again. I remember in the vessel in which I was brought over, in the men’s apartment, there were several brothers who, in the sale, were sold in different lots; and it was very moving on this occasion to see and hear their cries at parting.

      This shows how separation from families is common because many don't see each other again which is horrible. He was talking about his personal experience about being sold into different places and suffering emotionally. It's sad and I wish to read and find out about the well-being of his sister and makes me question that did he ever got to see her again like he did last time.

    2. The first object which saluted my eyes when I arrived on the coast was the sea, and a slave-ship, which was then riding at anchor, and waiting for its cargo. These filled me with astonishment, which was soon converted into terror, which I am yet at a loss to describe, nor the then feelings of my mind. When I was carried on board I was immediately handled, and tossed up, to see if I were sound, by some of the crew; and I was now persuaded that I had got into a world of bad spirits, and that they were going to kill me. Their complexions too differing so much from ours, their long hair, and the language they spoke, which was very different from any I had ever heard, united to confirm me in this belief. Indeed, such were the horrors of my views and fears at the moment, that, if ten thousand worlds had been my own, I would have freely parted with them all to have exchanged my condition with that of the meanest slave in my own country. When I looked round the ship too, and saw a large furnace or copper boiling, and a multitude of black people of every description changed together, every one of their countenances expressing dejection and sorrow, I no longer doubted my fate, and, quite overpowered with horror and anguish, I fell motionless on the deck and fainted. When I recovered a little, I found some black people about me, who I believed were some of those who brought me on board, and had been receiving their pay; they talked to me in order to cheer me, but all in vain. I asked them if we were not to be eaten by those white men with horrible looks, red faces, and long hair? They told me I was not; and one of the crew brought me a small portion of spiritous liqour in a wine glass; but, being afraid of him, I would not take it out of his hand. One of the blacks therefore took it from him and gave it to me, and I took a little down my palate, which, instead of reviving me, as they thought it would, threw me into the greatest consternation at the strange feeling it produced having never tasted any such liquor before. Soon after this, the blacks who brought me on board went off, and left me abandoned to despair. I now saw myself deprived of all chance of returning to my native country, or even the least glimpse of hope of aining the shore, which I now considered as friendly: and even wished for my former slavery, in preference to my present situation, which was filled with horrors of every kind, still heightened by my ignorance of what I was to undergo. I was not long suffered to indulge my grief; I was soon put down under the decks, and there I received such a salutation in my nostrils as I had never experienced in my life; so that with the loathsomeness of the stench, and crying together, I became so sick and low that I was not able to eat, nor had I the least desire to taste any thing. I now wished for the last friend, Death, to relieve me; but soon, to my grief, two of the white men offered me eatables; and, on my refusing to eat, one of them held me fast by the hands, and laid me across, I think, the windlass, and tied my feet, while the other flogged me severely. I had never experienced any thing of this kind before; and although not being used to the water, I naturally feared that element the first time I saw it; yet, nevertheless, could I have got over the nettings, I would have jumped over the side; but I could not; and, besides, the crew used to watch us very closely who were not chained down to the decks, lest we should leap into the water; and I have seen some of these poor African prisoners, most severely cut for attempting to do so, and hourly whipped for not eating. This indeed was often the case with myself. In a little time after, amongst the poor chained men, I found some of my own nation, which in a small degree gave ease to my mind. I inquired of them what was to be done with us? they give me to understand we were to be carried to these white people’s country to work for them. I then was a little revived, and thought, if it were no worse than working, my situation was not so desperate: but still I feared I should be put to death, the white people looked and acted, as I thought, in so savage a manner; for I had never seen among any people such instances of brutal cruelty; and this not only shewn towards us blacks, but also to some of the whites themselves. One white man in particular I saw, when we were permitted to be on deck, flogged so unmercifully, with a large rope near the foremast, that he died in consequence of it; and they tossed him over the side as they would have done a brute. This made me fear these people the more; and I expected nothing less than to be treated in the same manner. I could not help expressing my fears and apprehensions to some of my countrymen: I asked them if these people had no country, but lived in this hollow place the ship? they told me they did not, but came from a distant one. ‘Then,’ said I, ‘how comes it in all our country we never heard of them?’ They told me, because they lived so very far off. I then asked, where were their women? had they any like themselves! I was told they had: ‘Ande why,’ said I, ‘do we not see them?’ they answered, because they were left behind. I asked how the vessel could go? they told me they could not tell; but that there were cloth put upon the mastsby the help of the ropes I saw, and then the vessel went on; and the white men had some spell or magic they put in the water when they liked in order to stop the vessel. I was exceedingly amazed at this account, and really thought they were spirits. I therefore wished much to be from amongst them, for I expected they would sacrifice me: but my wishes were vain; for we were so quartered that it was impossible for any of us to make our escape. While we staid on the coast I was mostly on deck; and one day, to my great astonishment, I saw one of these vessels coming in with the sails up. As soon as the whites saw it, they gave a great shout, at which we were amazed; and the more so as the vessel appeared larger by approaching nearer. At last she came to anchor in my sight, and when the anchor was let go, I and my countrymen who saw it were lost in astonishment to observe the vessel stop; and were now convinced it was done by magic. Soon after this the other ship got her boats out, and they came on board of us, and the people of both ships seemed very glad to see each other. Several of the strangers also shook hands with us black people, and made motions with their hands, signifying, I suppose, we were to go to their country; but we did not understand them. At last, when the ship we were in had got in all her cargo they made ready with many fearful noises, and we were all put under deck, so that we could not see how they managed the vessel. But this disappointment was the least of my sorrow. The stench of the hold while we were on the coast was so intolerably loathsome, that it was dangerous to remain there for any time, and some of us had been permitted to stay on the deck for the fresh air; but now that the whole ship’s cargo were confined together, it became absolutely pestilential. The closeness of the place, and the heat of the climate, added to the number in the ship, which was so crouded that each had scarcely room to turn himself, almost suffocated us. This produced copious perspiration, from a variety of loathsome smells, and brought on a sickness amongst the slaves, of which many died, thus falling victims to the improvident avarice, as I may call it, of their purchasers. This wretched situation was again aggravated by the galling of the chains, now become insupportable; and the filth of the necessary tubs, into which the children often fell, and were almost suffocated. The shrieks of the women, and the groans of the dying, rendered the whole a scene of horror almost inconceiveable. Happily perhaps for myself I was soon reduced so low here that it was thought necessary to keep me almost always on deck; and from my extreme youth I was not put in fetters. In this situation I expected every hour to share the fate of my companions, some of whom were almost daily brought upon deck at the point of death, which I began to hope would soon put an end to my miseries. Often did I think many of the inhabitants of the deep much more happy than myself; I envied them the freedom they enjoyed, and as often wished I could change my condition for theirs. Every circumstance I met with served only to render my state more painful, and heighten my apprehensions and my opinion of the cruelty of the whites. One day they had taken a number of fishes; and when they had killed and satisfied themselves with as many as they thought fit, to our astonishment who were on deck, rather than give any of them to us to eat, as we expected, they tossed the remaining fish into the sea again, although we begged and prayed for some as well as we could, but in vain; and some of my countrymen, being pressed by hunger, took an opportunity, when they thought no one saw them, of trying to get a little privately; but they were discovered, and the attempt procured them some very severe floggings.

      To me, this is part is kind of dark. He was on the slave ship. He saw the men who are being tortured and chained up on the ship. The men were so cruel to them. Thats sad that he had to witness about that. It's heartbreaking that at first, he got kidnapped, his sister got kidnapped, he got departed from his sister, sold into slavery, sold into many different locations, then have to witness the men who are being tortured on the ship. This is very dark and traumatizing. And also, he still feeling hopeless of the desire of returning back home and also reunite with his sister again. It makes me question that would he ever reunite with his sister again.

    1. Students can simultaneouslybe-come both "unstuck"(distanced from the ways they have always thought, nolonger so complicit with oppression) and "stuck" (intellectually paralyzed sothat they need to work through feelings and thoughts before moving on withthe more "academic"part of a lesson). Though paradoxicaland in some waystraumatic,this condition should be expected: by teaching studentsthat the veryways in which we think and do things can be oppressive, teachers should expecttheir students to get upset

      From my readings in EDSCI so far, this is the first time I have seen someone address the heaviness that may come with being a transformative learner. Many of our biases and student's biases as well as oppressive ideologies might be the only way they have learned. The word trauma exemplifies the impact of unlearning things that perhaps have been the building blocks of your identity. I think about my 7th graders, primarily Latinx, primarily Christian, primarily male, and primarily from low income households surrounding our school;they might experience this trauma when presented with ideas that deviate from what they hold to be truth. However, their truth has been their reality, and rather than negate them, they have to be part of the conversation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their thoughtful comments. Here we provide a point-by-point response to their reviews. All additional experiments that are present in the revised manuscript, or that we plan to include in the final manuscript, are numbered.

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The concept introduced by this paper is exciting and novel. However, the current paucity of presented data can lead to incorrect interpretations of the findings and speculations that might not hold true after a more rigorous assessment of the observed phenomenon.

      The premise of this study builds upon an interaction between the PAXT complex and nuclear YTH domain containing proteins. However, figures 1B and C should be improved. The interacting band for the ZFC3H1 presented in panel B does not seem to match the size of the construct used in panel C. Is the Flag version of ZFC3H1 expressing a smaller isoform for this protein? __

      The reviewer is correct in that endogenous ZFC3H1 (which migrates at 250kD with a minor band at 150kD, see Figure 1B in the initial manuscript) appears to differ from the FLAG-tagged construct as expressed from a plasmid transfected in HEK293 cells (which migrates as two bands at 180kD and 200kD, see Figure 1C in the initial manuscript). For the endogenous protein, the predicted molecular weight is 226kD and the 250kD band disappears when cells are transduced with lentivirus containing shRNAs against ZFC3H1 (see Figure 4A in the initial manuscript), indicating that it is the correct protein. Both the 250kD endogenous protein (*) and the 200kD overexpressed protein (**) in transfected HEK293 and U2OS cells are detected in immunoblots using anti-ZFC3H1 antibodies (see Figure 1 in this document) indicating that the over-expressed protein is indeed ZFC3H1.

      [ Figure 1]

      _Figure 1. Molecular Weight Size Comparison of Endogenous ZFC3H1 and FLAG-ZFC3H1 (1-1233). _Lysates from HEK293 and U2OS that were either untransfected or transfected with FLAG-ZFC3H1 (1-1233) plasmid. We labelled the bands corresponding to the endogenous ZFC3H1 “*” and FLAG-ZFC3H1 (1-1233) “**”.

      We have sequenced the plasmid, and discovered that it contains an additional sequence inserted within the middle of the ZFC3H1 cDNA with a premature stop codon. As such, the version of the protein that is expressed from the plasmid only contains amino acids 1-1233 of the endogenous protein and is missing amino acids 1234-1989. The deleted region only contains TPR repeats, and is not known to interact with any of the well characterized interactors of ZFC3H1 (Wang, Nuc Acid Res 2021, Figure 3). We have renamed this construct FLAG-ZFC3H1 (1-1233). Given these new considerations, our results are consistent with the idea that the N-terminal portion of ZFC3H1 interacts with U1-70K, YTHDC1 and YTHDC2. We will change the text to reflect this.

      We are currently in the process of deleting the small insertion to obtain a plasmid that encodes a full length version of human ZFC3H1. For the final manuscript:

      Experiment #1) We will repeat the co-immunoprecipitation experiment with the full length FLAG-ZFC3H1 to determine whether it interacts with YTHDC1 and YTHDC2. This will take a few weeks.

      __Also, the YTHDC1-2 interaction in panel C is not as convincing considering the negative controls lane show some degree of binding. __

      Although the reviewer is correct that there is substantial background binding in the YTHDC1 immunoblot, we disagree with their characterization of the results with the YTHDC2 immunoblot (see Figure 1B-C in the initial manuscript). In the new manuscript we have included:

      Experiment #2) A new co-immunoprecipitate of the FLAG-tagged ZFC3H1 (1-1233) from HEK293 cells under more stringent conditions where the background level of YTHDC1 binding to beads is negligible. We have already completed this experiment (see Figure 1D in the revised manuscript).

      __Additionally, can the authors test if their RNaseA treatment worked? __

      In the new manuscript we have included:

      Experiment #3) A new co-immunoprecipitate of FLAG-YTHDC1 immunoblotted for eIF4AIII from HEK293 cell lysates. We find that without RNAse, there is some eIF4AIII in the precipitates but that the levels diminish substantially after RNAse A/T1 treatment. We have already completed this experiment (see Figure 1B in the preliminary revised manuscript).

      __Why do you need 18 hours to observe the nuclear export of your modifiable construct when inhibiting METTL3 in figure 3? Is it possible that your observation is secondary to phenotypes these cells develop as a result of blocking METTL3? __

      We treated cells for this period of time so that during the expression of the reporter, all of the newly synthesized mRNA is expressed in the absence of m6A methyl transferase activity. For shorter treatment times, it is unclear whether the bulk of the reporter mRNA, which would be synthesized before the treatment, would lose any pre-existing m6A marks, making a negative result hard to interpret. Previously we found that although 50% of intronic polyadenylated (IPA) transcripts from our reporters are rapidly degraded, about 50% are stable and are nuclear retained over extended periods of time (see Lee at al., PLOS ONE 2015; https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0122743 Figure 3B-G). We believe that the bulk of the reporter mRNA that we are visualizing is stable and accumulates in the nucleus. Given that METTL3-depletion inhibits nuclear retention and that versions of the IPA reporter that lack m6A modification motifs are exported, we think that the most likely interpretation of the 18 hour STM2457 treatment experiments is that the lack of methyltransferase activity had a direct effect, rather than an indirect effect, on nuclear retention. We would be open to performing more experiments if the editors insist, however we ordered STM2457 four weeks ago and it has yet to arrive from Sigma-Millipore. Performing this experiment may substantially delay our ability to resubmit the manuscript in a timely manner.

      __Is ALKBH5 nuclear and/or cytoplasmic in the cell system used? __

      According to The Human Protein Atlas, ALKBH5 is predominantly nuclear in U2OS cells, with some present in the cytoplasm (https://www.proteinatlas.org/ENSG00000091542-ALKBH5/subcellular#human).

      In the revised manuscript we have included:

      Experiment #4) Data from subcellular fractionation demonstrating that ALKBH5 is present in both the nucleus and cytoplasm that we have already performed (see Figure 4J in the preliminary revised manuscript).

      __Reviewer #1 (Significance (Required)):

      The study is highly significant __

      ------

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In the manuscript by Lee et al. entitled "N-6-methyladenosine (m6A) Promotes the Nuclear Retention of mRNAs with Intact 5'Splice Site Motifs", the authors provide evidence that m6A modifications within specific regions of transcripts can confer nuclear retention. These results are important because they add to our understanding of how m6A modifications can contribute to post-transcriptional regulation. Although the authors do not quite come out and say this, data seem to be accumulating to suggest that the location of the m6A modifications within a given transcript can dictate the functional consequences of those modifications.__

      We thank the reviewer for pointing this out. We have included a few sentences in the new preliminary revised manuscript pointing out that the location of the m6A modification in IPA transcripts, with respect to intact 5’SS and poly(A) signals, may play a role in promoting nuclear retention.

      __The current work builds on previous findings from these authors identifying factors critical for retention of intronic polyadenylated (IPA) transcripts. The present study identified m6A modification as one of the signals for the retention of such transcripts. The authors use reporters for their analysis and also examine validated endogenous IPA transcripts. The data presented supports the conclusions albeit they show a surprising finding for one of the m6A erasers, ALKBH5. However, there is some controversy over the mechanism by which ALKBH5 functions and whether the m6A mark is truly reversible, so these results may continue to add to this point of view.

      Major Comments: One experiment that might add to the argument would be overexpression of Mettl3 as compared to catalytically inactive Mettl3. The prediction would be that the reporter transcript with intact DRACH sequences would be even more retained in the nucleus in a manner that depends on Mettl3 catalytic activity. For some of the data presented, the reporter is already wholly nuclear so no difference could be detected, but in the U2OS cells shown in Figure 2B, it appears that an increase in nuclear localization might be evident. Such an experiment would add an orthogonal approach to demonstrate that the methylation by Mettl3 is required for retention. If such an experiment would work with the endogenous IPA transcripts shown in Figure 4, but these transcripts may already be too nuclear to detect any increase in nuclear retention.

      __

      We have performed two experiments that try to address this but they gave negative results:

      Experiment #5) We have over-expressed wildtype and a methyl transferase mutant FLAG-METTL3 and assessed the nuclear export/retention of ftz-Δi-5’SS mRNA. There was no effect (see Figure 2 in this document).

      [Figure 2]

      __Figure 2. Over-expression of METTL3 does not increase the nuclear retention of ftz-Δi-5’SS. __U2OS cells were co-transfected with ftz-Δi-5’SS reporter and either FLAG-METTL3 or FLAG-METTL3-D395A, which lacks methyl-transferase activity (Wang, Mol Cell 2016). Cells were fixed, stained for ftz mRNA by fluorescent in situ hybridization and METTL3 using anti-FLAG antibodies. The nuclear and cytoplasmic distribution of ftz mRNA was quantified as described in the manuscript. Note that this is the average of one independent experiment (each bar consisting of the average of at least 50 cells). We plan to repeat this two more times, but we anticipate that these will show the same result.

      We could include this negative data as a supplemental figure. We believe that there are two possible reasons for this experimental result. First, as the reviewer points out, the reporter transcripts are already too nuclear to detect any significant change. Second, METTL3 is part of a larger complex that includes several proteins including METTL14, WTAP and potentially other proteins (for example see Covelo-Molares, Nuc Acid Res 2021). We may need to co-express all of these proteins to see an effect.

      Experiment #6) We have also expressed versions of ftz-Δi and ftz-Δi-5’SS mRNA with optimized m6A modification (i.e. DRACH) motifs (AGACT) to enhance methylation (“e-m6A-ftz”). We only observed a slight increase in nuclear retention but it is not significant (see Figure 2A,C in the revised manuscript).

      Again, this result could be explained by the fact that the reporter is too nuclear to detect any significant increase in retention. We had originally performed this in parallel with the no-m6A-ftz-Δi-5’SS reporters but did not report this negative data in the original manuscript.

      __Some rather minor changes to the presentation of the data could enhance the impact of this study.

      Specific Comments:

      The primary question in this manuscript is comparing reporters with m6A site (intact DRACH sequences) to those without. For this reason, organizing the data to the +/- DRACH sites are adjacent to one another might make the most sense. This point is evident in Figure 1C where perhaps simply changing the order of the bars presented to put the ones directly compared adjacent would be preferable. Then the p-value would compare sets of data directly adjacent to one another. __

      We thank the reviewer for this suggestion and we have made these changes to the figures in the preliminary revised manuscript.

      __While the authors show representative fields/cells for most assays, they do an excellent job of providing quantitation as well. One exception is Figure 3D, which shows a single cell image for the most key panel (the 5'SS-containing reporter upon Mettl3 depletion). If there is not a field with more cells, the authors could create a montage. __

      In the revised manuscript, we have replaced this image with one containing multiple cells expressing the reporter.

      __Minor Comments:

      Figure presentation:

      The text in a number of the figures is VERY small (Figures 1B,1C, and 4A) for example. __

      We have fixed this in the new manuscript.

      __Figure 3A includes the label "shRNA:" at the top, but these cells are treated with Mettl3 inhibitor and there does not appear to be any shRNA employed, so this seems like a labeling error. __

      We have fixed this in the new manuscript.

      __In Figure 3C, the immunoblot of Mettl3, there are three bands that all disappear completely upon knockdown of Mettl3- are these all Mettl3? This should at least be mentioned in the legend and perhaps indicated in the figure. The authors do mention in the text employing shRNAs to target multiple Mettl3 isoforms, so likely this is the case. __

      We have clarified these issues in the new manuscript.

      __Minor points (some really minor to just polish the presentation for clarity):

      The word "since" should only be used if there is a time element- otherwise the word "as" is preferable.

      For example on p. 4, the sentence: "Since inhibition of mRNA export typically enhances the nuclear retention of RNAs with intact 5'SS motifs (Lee et al. 2020),.." would more precisely read "As inhibition of mRNA export typically enhances the nuclear retention of RNAs with intact 5'SS motifs (Lee et al. 2020),..". __

      We thank the reviewer for pointing this out. We have fixed this issue in the revised manuscript.

      __Reviewer #2 (Significance (Required)):

      Summary: In the manuscript by Lee et al. entitled "N-6-methyladenosine (m6A) Promotes the Nuclear Retention of mRNAs with Intact 5'Splice Site Motifs", the authors provide evidence that m6A modifications within specific regions of transcripts can confer nuclear retention. These results are important because they add to our understanding of how m6A modifications can contribute to post-transcriptional regulation. Although the authors do not quite come out and say this, data seem to be accumulating to suggest that the location of the m6A modifications within a given transcript can dictate the functional consequences of those modifications.

      This study would be of significant interest to those that study gene expression in any context as well as cell biologists as the data add to our understanding of export of mRNA from the nucleus. This work also adds to our understanding of the biological consequences of m6A modification, which is an area of significant interest. In my opinion, the authors could make a broader conclusion that we do, which is that the location of the modification significantly dictates function- an extension of previous findings mostly focused on processed mRNA transcripts. __

      -------

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Quality control of mRNA is vital for all types of cells. In eukaryotic cells, nuclear export of misprocessed mRNAs containing the 5' splice site is prevented. In this manuscript, Lee and colleagues demonstrate that the nuclear retention of intronic polyadenylated transcripts is dependent on m6A modification. Based on the results shown in yeast, they perform immunoprecipitation experiments and demonstrate the interaction between ZFC3H1, a component of the PAXT complex, and YTHDC1 and YTHDC 2, nuclear YTH RNA-binding proteins that recognize m6A-modified transcripts. The study also shows the interaction of U1-70K with YTHDC1 and with ZFC3H1. Depletion of YTHDC1/2 prevents the nuclear retention of IPA transcripts. Additionally, CLIP-seq analysis is performed, demonstrating that m6A modification is enriched around the 5' splice site motif and the 3' polyadenylation site in IPAs. From these observations, they conclude that m6A modification contributes to the quality control of mRNA by promoting nuclear retention of misprocessed transcripts.

      Major Points 1. The interaction between ZFC3H1 and YTHDC1 is clearly shown by immunoprecipitation of FLAG-tagged YTHDC1 in Figure 1B. However, the co-purification of YTHDC1 with FLAG-tagged ZFC3H1 in Figure 1C is rather ambiguous. Additionally, the immunoprecipitated samples do not appear to show signals corresponding to FLAG-tagged ZFC3H1, making it unclear if the immunoprecipitation is working. It is essential to provide a better quality result to clarify these observations. __

      Please see our responses to reviewer #1. We have repeated the co-immunoprecipitation of FLAG-ZFC3H1 (1-1233) with YTHDC1 under more stringent conditions and have reduced the background binding (see Figure 1B and D in the new manuscript). We have also determined why the FLAG-ZFC3H1 is smaller than expected as the construct contains a premature stop codon. As explained above, we are in the midst of generating a full-length FLAG-ZFC3H1 and we plan to repeat the co-immunoprecipitation with this new construct.

      2. While the authors demonstrate that the m6A modification is dispensable for the targeting of IPA reporter transcripts to the nuclear speckles, it would be valuable to investigate whether m6A is required for their exit from the nuclear speckles. Do reporter transcripts with m6A motifs remain in the nuclear speckles at later time points?

      We have now analyzed the colocalization of nuclear speckles (SC35) with ftz-Δi-5’SS, which contains both a 5’SS and DRACH motifs, and no-m6A-ftz-Δi-5’SS, which contains a 5’SS but lacks DRACH motifs, at steady state – i.e. after 18-24 hours of transfection (as opposed to at early time points as shown in Figure 2D-E of the initial manuscript). Unexpectedly, we see that both mRNAs continue to colocalize with nuclear speckles, although the no-m6A-ftz-Δi-5’SS mRNA is well exported from the nucleus and its signal in nuclear speckles is faint (see Figure 2F-H in the new manuscript).

      Previously, we observed that ftz-Δi-5’SS required the 5’SS motif to remain in nuclear speckles at these later time points (Lee PLOS ONE 2015 and Lee RNA 2022). Upon closer inspection, ftz-Δi-5’SS mRNA also accumulates in additional nuclear foci that are not SC35-positive. Our new results may indicate that m6A marks promote the transfer of mRNAs from nuclear speckles to other foci, but more data is required to make a firm statement. Given this, we plan to conduct further experiments which may take a month to complete:

      Experiment #7) We are now assessing whether these additional ftz-Δi-5’SS foci correspond to either YTHDC-positive foci which were previously shown to partially overlap nuclear speckles and sequester m6A-rich mRNAs (Cheng Cancer Cell 2022), or “pA+ RNA foci” which accumulate MTR4/ZFC3H1-targetted RNAs when the nuclear exosome is inhibited (Silla Cell Reports 2018). These foci are enriched in ZFC3H1. We plan on co-staining ftz-Δi, ftz-Δi-5’SS, no-m6A-ftz-Δi and no-m6A-ftz-Δi-5’SS with SC35, YTHDC1 and ZFC3H1 to determine whether m6A may help to transfer mRNAs from nuclear speckles to YTHDC1 or ZFC3H1-enriched foci.

      __3. Figures 5B and 5C suggest that ZFC3H1 is required for the degradation of IPA transcripts. However, the range of the vertical axis is inappropriate and it is difficult to assess the extent of the increase in expression levels. Please adjust the vertical axis range for improved clarity. __

      We thank the reviewer for the feedback we have added additional graphs with an expanded vertical axis to demonstrate that ZFC3H1 is required for the degradation IPA transcripts.

      Minor Points 1. page 4, line 2 "RNAse" should be corrected to "RNase".

      We thank the reviewer for catching this error. We have fixed this.

      __ 2. page 7, line 5: Is the statement "prevents the nuclear export and decay of non-functional and misprocessed RNAs" correct? m6A modification promotes the decay of such RNAs. __

      We thank the reviewer for pointing this out. We have altered the text to clarify that m6A promotes decay.

      __3. Figure 2E: ftz-∆i should be ftz-∆i-5'SS. __

      We thank the reviewer for catching this error. We have fixed this.

      __4. Figure 5A: It would be helpful to indicate the number of IPA transcripts analyzed. __

      We have included this information.

      __Reviewer #3 (Significance (Required)):

      Overall, the work is sound and generally well-controlled. This study advances our understanding of the quality control of misprocessed transcripts in higher eukaryotes. This reviewer suggests a few points for clarification or improvement. __

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We would like to thank the editorial staff and the reviewers for their handling of our manuscript. We were very pleased with the timely communications from Review Commons, and we are grateful to have been assigned this insightful and constructive group of reviewers.

      The reviewers were well-suited to evaluate our work based on their stated areas of expertise (cancer biology, image analysis, machine learning, cell-based screening, etc.). As such, we received thoughtful and constructive feedback, which we have already incorporated into our attached revision. We are confident that these reviews have improved our manuscript.

      Our goal with this manuscript is to present a proof-of-concept study where high-content imaging and morphological profiling are used to characterize drug resistance in clonal cell lines. The main criticism from reviewers was that our original manuscript may have overstated our method’s ability to discriminate the signal of bortezomib resistance and that any extension beyond cultured cells (to patient samples for example) would require significant follow-up studies. The reviewers suggested that such work would be beyond the scope of our study, and recommended toning down our language to better reflect the limitations of this proof-of-concept work. We have embraced this suggestion, extensively revising our text, and we now believe our language and tone more accurately reflects our results. The reviewers also suggested follow-up computational analyses to more robustly characterize the bortezomib resistance signature. We have performed these analyses and added their description to our revised manuscript. We feel that these analyses have improved understanding of the signature, and will help a reader to gain a deeper understanding of our results and methodology.

      The reviewers also suggested several minor changes; many of which we embraced fully, but others that we chose not to incorporate. We felt that a lack of clarity in our text contributed to these reviewer suggestions. In these cases, we improved clarity in the text and responded to each comment point-by-point in the “prefer not to carry out” section. Further, we address all reviewer comments in the following document point-by-point, grouped by common themes across reviewers (e.g., tone, clarity, analyses, etc.).

      Lastly, a common theme among reviewer comments was their appreciation for our strong methodology and data transparency (examples pasted below). We are extremely gratified by this observation as we feel this is a particular strength of our manuscript. In addition, we were pleased to see reviewers engaged by our work, acknowledging the interest this manuscript is likely to generate among a broad range of scientific disciplines.

      Examples of reviewer appreciation of our strong methodology and data transparency:

      Reviewer 1: “However, this does not imply that the same approach can not achieve the goal, perhaps by using other cell painting markers for bortezomib-sensitivity, or with the same markers to assess sensitivity of different drugs. The cell painting + analysis approaches are not new and the clinical impact is questionable, but the technical aspects (data, analysis) are exceptional and the concept may hold as I described above.”

      Reviewer 2: “The paper is well written, and the text is clear, as is the presentation of data and transparency of methods being utilized. The methods were applied appropriately and followed established standards in the field. The paper's premise is timely and interesting, addressing a pressing issue in cancer therapy: making informed treatment decisions fast, based on markers found in tumors early in tumor development, and using image-based screening for characterizing drug resistance before treatment could be an option. A fascinating bit of the manuscript is the description of the feature selection from the screen is done systematically, considering the technical and biological variability and technical artifacts and modeling covariates using linear models seems a very appropriate way of doing so and could serve as another proof of concept that this is indeed the most robust way of modeling and removing signal of technical covariates from the data.”

      Reviewer 3: “The strengths of this study are the machine learning best practice and detailed methodology. The experiments could be reproduced and statistical analysis is more than adequate. The analysis takes into account batch effects, well position, differences in cell numbers, and other sources of technical variation that complicate high-content image analysis. It is a good exemplar of how unsupervised morphological profiling can be applied to imaging data. The major limitation is the generalizability of this particular method for patient samples. This could be addressed in the Discussion.”

      1. Description of the planned revisions

      We have incorporated all planned revisions.

      1. Description of the revisions that have already been incorporated in the transferred manuscript

      Text revisions already carried out

      1. [Text revision] We have materially toned down our claims in the manuscript in two distinct areas: A) model performance and B) potential clinical application. A) Model performance. We specifically balanced our discussion of the discriminative signal of the Bortezomib Signature. While the signature adequately separated never-before-seen wildtype and resistant clones with metrics well above randomly permuted baselines (accuracy near 80%, average precision about 70%, area under the ROC curve (AUROC) about 84%), there were many limitations that we should have more explicitly highlighted. For example, many individual profiles were incorrectly classified, some clones were predicted entirely incorrectly, and many profiles did not receive Bortezomib Signature scores above the randomly permuted baseline. We have more clearly discussed these limitations and used more balanced language (see key examples of text-based changes below). Additionally, we modified a figure (now Figure 3) to include boxplots of clones that explicitly show the Bortezomib Signature scores of each well profile and permit examination of the strength of the signature for each clone (previously found in Figure 2-Supplement 9). Lastly, we add a new supplementary figure (now Figure 5-Supplement 1) that describes a feature space analysis of misclassified samples. Please note that this figure rearrangement and new analysis helped to balance our claims, but were also performed in response to other tangential reviewer comments. B) Clinical application. In the abstract, introduction, and discussion, we further emphasized that this work is a proof of concept, and that more advances must be made prior to clinical application.

      We made these changes in direct response to the following reviewer comments:

      Reviewer 1 - Major Comment 1 (relevant excerpts)

      While I am convinced that the signature captures morphological phenotypes associated with drug resistance, at the cumulative scale, the discriminative signal of a single cell type seems weak… With Fig. 4, the data fully supports the argument that the bortezomib-signature encodes bortezomib-resistance, but the signal is weak. Thus statements such as "We found the Bortezomib Signature could predict whether a cell line was bortezomib-resistant or bortezomib-sensitive" (line #172) and the specificity statements in the abstract" (line #28) are not supported by the data in my opinion. I would recommend the authors to tune down these and other related statements throughout the manuscript.

      Reviewer cross-commenting - Reviewer 1

      My main critic is regarding "over selling" a weak discriminative signal. Specifically, I am not convinced that the major claims regarding predicting sensitivity and specificity at the single cell types scales are supported by the data. Since reviewer #2 and #3 did not raise this concern I think it is worth discussion here.

      Once these statements are tuned down - I think no significant additional work is needed to make the point that they can measure a discriminative signal. If they want to make these claims, perhaps they'd like to collect more data to gain statistical power (but I am not optimistic this will work at the single cell level).

      Personally, I was happy with the authors' choice of cell lines not included in the training dataset. I am not convinced that additional cell lines + validations are necessary for making the point of a proof of principle.

      Reviewer cross-commenting - Reviewer 2

      I agree that, perhaps, my major criticism of the paper was the manuscript's 'overselling' of claims that were only weakly supported by the data. Yes, if the authors tune down their claims and clearly state that this is an interesting starting point and proof of concept study, it might be ok to publish with only minor revisions. If the claims should be more generalized, then this study needs more data supporting the conclusions and the method's predictive power.

      Reviewer 2 - Major Comment 8

      Lastly, I find some misfits between the question, the model used, and the conclusions drawn. The authors start by exploring the problem of bortezomib resistance in cancer treatment, which they say is a devastating issue for patients with, e.g., multiple myeloma. Yet, the authors use HCT116 as their model cell line, a microsatellite instable, colorectal cell line with several intrinsic mutations that make it a difficult model to address physiologically relevant medical problems after all. The authors then go on to suppose that their method might be suitable to diagnose resistance in patient samples, but I am not convinced this conclusion can be speculated based on data from HCT cells. I suggest the authors test their approach on at least two other cell lines (maybe from different tissues) and benchmark their results against a dataset of digital pathology where such predictions are made from stained and analyzed tissue slices. This way, after a thorough benchmark against related third-party data sets, the method would significantly gain relevance, the paper would appeal to a broader audience, and the advance gains more merit.

      Reviewer 3 - Major Comment 5

      It is not clear from the Discussion whether this type of analysis is more broadly applicable to cell lines derived from patients, rather than selected from a parental cell line, or if this approach would be more efficient than genotyping or next-gen sequencing. How many replicates and ground truth cell lines would be necessary for predictive confidence?

      We edited the last two sentences of the abstract to tone down specificity claims (“provide evidence”) and clarify that we are establishing a “proof-of-concept framework”.

      • This signature predicted bortezomib resistance better than resistance to other drugs targeting the ubiquitin-proteasome system. Our results establish a proof-of-concept framework for the unbiased analysis of drug resistance using high-content microscopy of cancer cells, in the absence of drug treatment.

      We revised the last paragraph of the introduction to contrast bortezomib predictions with ixazomib/CB-5083 predictions, and to remove claims about “using microscopy to guide therapy”.

      • This morphological signature correctly predicted the bortezomib resistance of seven out of ten clones not included in the signature training dataset. Overall, our results establish a proof-of-concept framework for identifying unbiased signatures of drug resistance using high-content microscopy. The ability to identify drug-resistant cells based on morphological features provides a valuable orthogonal method for characterizing resistance in the absence of drug treatment.

      To tone down claims in the figures, we added boxplots to Figure 3 (previous Figure 2) showing specific distribution of signature scores per well profile and updated Figure 4 legend (previous Figure 3).

      • Figure 4. Bortezomib Signature has limited ability to characterize clones resistant to other ubiquitin-proteasome system inhibitors.

      We modify the following text in the discussion to tone down claims of specificity and clinical utility:

      • This Bortezomib Signature correctly predicted the bortezomib resistance of seven out of ten clones not included in the training dataset and was more specific to bortezomib-resistance given its limited ability to identify clones that were resistant to other UPS-targeting drugs.

      Though it is unclear whether this method can be extended to patient samples, where identifying intrinsic drug resistance in cells prior to treatment has the potential to improve targeted cancer therapy, our results are an encouraging proof of concept. We expect that further refinement may develop Cell Painting as a tool for identifying drug-resistant cells, perhaps even guiding strategies to overcome intrinsic resistance.

      1. [Text revision] We defined LD50 in text (originally line #97), changed description of resistant clone selection to remove main text references to LD90 (originally line #87), and stated drug concentrations used for selection in Methods. We also defined LD90 in the Methods and described its role in determining the drug concentrations to use for clone selection. This change was in response to the following comments:

      Reviewer 1 - Minor Comment 2

      What is LD90 (line #87)? LD50 (line #97)?

      Reviewer 2 - Minor Comment 5

      What was the LD 90 per drug on HCT cells? Rather than LD90 foldchanges, absolute concentrations should be used in the results and discussion to allow the reader to vet the conclusions.

      • To determine the appropriate drug concentrations to use in order to isolate drug-resistant clones, we performed proliferation assays on HCT116 parental cells with our drugs of interest: bortezomib (proteasome inhibitor), ixazomib (proteasome inhibitor), or CB-5083 (p97 inhibitor) (Fig. 1-Supplement 1 A-D).
      • We characterized the bortezomib-resistant clones and found that the median lethal doses (LD50s) were ~2.8- to ~9-fold that of HCT116 parental cells (Fig. 1-Supplement 2 B).
      • Briefly, HCT116 cells were plated in 150 mm dishes and grown in the presence of the desired drug at a concentration that resulted in the death of the majority of cells (selection concentrations: bortezomib, 12 nM; ixazomib, 150 nM; CB-5083, 600 and 700 nM).
      • Using the data from our proliferation assays, we calculated the median lethal dose (LD50) for each of our drugs of interest by fitting data of normalized growth vs. log[drug concentration] to a sigmoidal dose-response curve using GraphPad Prism (v.9.2.0) (Fig. 1-Supplement 1 D).

      • [Text revision] We thank the reviewer for allowing us an opportunity to improve clarity on the clones we used. We now describe the total number of clones generated and removed unnecessary references to specific clones for ease of reading (originally lines #96-98) (We maintain all references to specific clones in the figures, legends, supplement, and methods)

      Reviewer 1 - Minor Comment 3

      It was not clear to me in the text which and how many cell lines were evaluated and the reader is forced to go to the SI. For example, "(BZ01-10 and BZ clones A and E)" (line #96-97) and "wild-type clones (WT01-05, 10, and 12-15)" (line #98) appeared when presenting the results without a clear explanation and made it harder for me to follow. Summary of the data (for example, based on Figure 2-Supplement 8) can be briefly mentioned in the text to make it more clear for the reader.

      We added the following to the second paragraph of the results:

      • Together these methods provided a total of twelve bortezomib-resistant, five ixazomib-resistant, five CB-5083-resistant, and twelve bortezomib-sensitive clones as well as HCT116 parental cells for our experiments.

      [Text revision] We removed duplicate text (originally lines #115-125).

      Reviewer 1 - Minor Comment 5

      1. Lines #104-111 were duplicated in lines #114-122.

      Reviewer 3 - Minor Comment 4

      Ten lines of text are duplicated on page 5.

      Reviewer 2 - Minor Comment 4

      on page 5, paragraph 4, there is a sizeable copy-and-paste error of text being identically replicated.

      1. [Text revision] We provided more intuition of the Bortezomib Signature in the results section (originally lines #150-151).

      Reviewer 1 - Minor Comment 6

      The "Bortezomib Signature" is a critical measurement but is only briefly mentioned in lines 150-151 ("..based on the direction-sensitive ranking method for phenotype analysis, singscore (Foroutan et al., 2018)"). Please provide more information/intuition.

      • We used these 45 features to compute a rank-based resistance score or “Bortezomib Signature” for each well profile based on the direction-sensitive method called singscore (Foroutan et al. 2018). Singscore ranks these 45 resistance-related features on a per sample basis and calculates a normalized score between -1 and 1, with higher values expected for bortezomib-resistant clones and lower values expected for bortezomib-sensitive clones.

      • [Text revision] We clarified that DNA sequencing had been performed solely on clones A and E in a previous study (originally lines #88-90). Furthermore, one of the strengths of our approach is that it can identify resistant clones in an unbiased fashion prior to molecular characterization. It is beyond scope to perform these sequencing studies in the present paper.

      Reviewer 2 - Minor Comment 3

      The authors talk about validating the mutation - PSMB5 by RNA-seq. However, the data for the genotyping/sequencing/characterization of these newly generated BZ-resistant lines are missing.<br />

      In the results, we clarify DNA sequencing that was previously performed on clones A and E

      • We also isolated bortezomib-sensitive (wild-type; WT) clones by dilution of the HCT116 parental cell line and acquired two bortezomib-resistant clones (BZ clones A and E) both with mutations in PSMB5 identified by RNA sequencing performed in previous work (Fig. 1-Supplement 1 E) (Wacker et al. 2012).

      In the last paragraph of the discussion, we highlight the strength of our unbiased approach

      • Together, our work has demonstrated the potential for morphological profiling with Cell Painting to be used as an unbiased method to characterize resistance in the absence of drug treatment. Our results indicate that different mechanisms of bortezomib resistance may generate distinct morphological profiles; with larger and broader training datasets, it may be possible to identify signatures for distinct mechanisms of bortezomib resistance as well as signatures of resistance to other drugs. Though it is unclear whether this method can be extended to patient samples, where identifying intrinsic drug resistance in cells prior to treatment has the potential to improve targeted cancer therapy, our results are an encouraging proof of concept. We expect that further refinement may develop Cell Painting as a tool for identifying drug-resistant cells, perhaps even guiding strategies to overcome intrinsic resistance.

      • [Text revision] We thank the reviewers for their suggestions. We agree that the description of the experimental design was somewhat unclear and have provided greater detail and clarity, particularly regarding the generation of clones. We used the HCT116 parental cell line to generate drug-resistant clones by identifying single surviving cells after drug treatment and allowing these cells to expand prior to isolating colonies for experimentation. We did not perform experiments to confirm whether these “clones” were isogenic and can not exclude cell migration during expansion or genetic drift as convoluting factors. However, we have provided greater detail in the descriptions of our method for clone isolation in order to address this concern.

      Reviewer 1 - Minor Comment 1

      More information in Fig. 1's legend would be helpful to follow the experimental design. I found it hard to follow in its current form and had to go back to carefully reading the main text to fully understand.

      Reviewer 2 - Minor Comment 6

      The description of the resistant clonal populations is confusing. As I understand, no single-cell clones were isolated during the selection procedure. Thus, the training lines are not yet isogenic clones but oligoclonal sub-populations of the parental cell line. The authors could provide more details here and discuss the different characteristics of their sub-populations, e.g., their growth kinetics or molecular alterations.

      We bolstered the description in the results.

      • We first isolated and characterized drug-resistant cells (Fig. 1 A). To isolate drug-resistant clones, we used an approach we have described previously (Wacker et al. 2012; Kasap, Elemento, and Kapoor 2014) and the HCT116 cell line. These cancer cells express multidrug resistance pumps at low levels and are mismatch repair deficient, providing a genetically heterogeneous polyclonal population of cells (Umar et al. 1994; Papadopoulos et al. 1994; Teraishi et al. 2005) allowing for isolation of drug-resistant clones in 2-3 weeks. We hypothesize that a rapid selection of resistance could favor the isolation of clones with intrinsic resistance. To determine the appropriate drug concentrations to use in order to isolate drug-resistant clones, we performed proliferation assays on HCT116 parental cells with our drugs of interest: bortezomib, ixazomib, or CB-5083 (Fig. 1-Supplement 1 A-D). We also isolated bortezomib-sensitive (wild-type; WT) clones by dilution of the HCT116 parental cell line and acquired two published bortezomib-resistant clones (BZ clones A and E) both with mutations in PSMB5 identified by RNA sequencing performed in previous work (Fig. 1-Supplement 1 E) (Wacker et al. 2012). We characterized the bortezomib-resistant clones and found that the median lethal doses (LD50s) for bortezomib were ~2.8- to ~9-fold that of HCT116 parental cells (Fig. 1-Supplement 2 B). In contrast, bortezomib-sensitive clones had LD50s for bortezomib that ranged from ~0.7- to ~1.2-fold that of HCT116 parental cells (Fig. 1-Supplement 2 A). Together these methods provided a total of twelve bortezomib-resistant, five ixazomib-resistant, five CB-5083-resistant, and twelve bortezomib-sensitive clones as well as HCT116 parental cells for our experiments.

      We also updated the legend for Figure 1A.

      • Figure 1. Experimental design for using Cell Painting to examine morphological profiles of drug-resistant cells. (A) Graphic of the experimental workflow: we isolated drug-resistant clones by treating parental HCT116 cells with a high dose of the desired drug and then expanded them for experiments. We isolated drug-sensitive clones by diluting HCT116 cells and then expanded them for experiments. We then performed proliferation assays on select clones to screen for multidrug resistance. Next, we performed Cell Painting on both drug-resistant and -sensitive clones, using multiplexed high-throughput fluorescence microscopy of fixed cells followed by feature extraction and morphological profiling to search for features that contribute to a signature of drug resistance.

      • [Text revision] We clarified that the Bortezomib Signature did not correspond to well position (originally lines #155-157).

      Reviewer 1 - Minor Comment 9

      Line #155-156: "We found that the pattern of Bortezomib Signatures corresponded to the cell identity plate layout", the word "not" is missing before "corresponded".

      We found that the pattern of Bortezomib Signatures did not correspond to well position relative to the plate (Fig. 2-Supplement 7 B), indicating that the well position for each clone was not strongly contributing to its Bortezomib Signature.

      1. [Text revision] We explicitly described the result that some misclassified clones (WT10, WT15, and BZ06) did not have unexpected bortezomib sensitivity as determined by proliferation assays. We also moved the supplementary figure to an updated Figure 3 to better highlight this result (described below in “Figure revisions already carried out”). Lastly, we add a new figure (Figure 5-Supplement 1) to more explicitly analyze the misclassified lines (described below in “New analyses already carried out”).

      Reviewer 3 - Minor Comment 3

      The bortezomib sensitivity of the WT lines used in the last experiments was determined and did not seem to be greater than parental. This could be mentioned in the text; the figure raises the question and the answer is provided, but it's in the supplemental material.

      While the Bortezomib Signature correctly characterized the bortezomib sensitivity of most clones, it consistently misclassified others (WT10, WT15, and BZ06) (Fig 5-Supplement 1 A). Proliferation assays conducted in earlier experiments showed that WT10 and WT15 were sensitive to bortezomib while BZ06 was resistant (Fig. 1-Supplement 2 A and B). By comparing these incorrect predictions with high-confidence correct predictions, we observed differences that varied by clone type, suggesting unique morphology may be driving each of these misclassifications (Fig. 5-Supplement 1 B and C). These results are consistent with the Bortezomib Signature being generalizable to clones not included in the training dataset and suggest that morphological profiling has the potential to identify bortezomib-resistant clones based on the morphological features of cells in the absence of drug treatment.

      1. [Text revision] We clarified that the metrics (accuracy and average precision) were based on median Bortezomib Signature scores of all replicate well-level profiles per clone. We can compare samples based on rank, and difference from 95% confidence interval of permuted data. There is no current way for our method to assign a likelihood. Also note that we have updated the discussion to discuss alternative metrics (see Reviewer 1 - Minor Comment 7) These are very important distinctions, and we are grateful to the reviewer for bringing them up.

      Reviewer 3 - Major Comment 3

      The study classifies cells as binary sensitive or resistant, but would results be improved by scoring based on likelihood of being resistant/sensitive?

      Reviewer 3 - Minor Comment 2

      It is not clear whether the accuracy was based on a percentage of replicates per cell line that were classified correctly or whether that was referring to classification of the cell line overall as sensitive/resistant.

      • We next examined whether the Bortezomib Signature was able to predict the bortezomib resistance of a clone based on morphological profiling data (Fig. 3 A-E and Fig. 3-Supplement 2 A and B). We called the clone bortezomib-resistant if the median Bortezomib Signature of all replicate well profiles was greater than zero and bortezomib-sensitive if the median Bortezomib Signature less than zero. In the training dataset, the Bortezomib Signature correctly predicted the bortezomib resistance of all ten clones, with median Bortezomib Signatures for eight out of ten clones beyond the 95% confidence interval for the randomly permuted data (Fig. 3 A). The accuracy of the Bortezomib Signature was 88% while the average precision was 81% for the training dataset (Fig. 3-Supplement 2 A and B) (see Methods). The signature performed similarly well in the validation dataset (Fig. 3 B), with an accuracy of 92% and an average precision of 89% (Fig. 3-Supplement 2 A and B). In the test dataset the Bortezomib Signature correctly predicted the bortezomib resistance of all clones, though only HCT116 parental cells had a median Bortezomib Signature outside the 95% confidence interval for the randomly permuted data (Fig. 3 C). The test dataset had an accuracy of 80% and an average precision of 68% (Fig. 3-Supplement 2 A and B). Similarly, in the holdout dataset the Bortezomib Signature had an accuracy of 78% and an average precision of 69% (Fig.3 -Supplement 2 A and B), and correctly predicted the bortezomib resistance of twelve out of thirteen clones, with WT01 misclassified as bortezomib-resistant (Fig. 3 D). In the holdout dataset, four of the twelve correctly characterized clones had median Bortezomib Signatures outside the 95% confidence interval for the randomly permuted data.

      We also mirrored language when discussing the ixazomib and CB-5083 results.

      • However, only two of the four correctly identified ixazomib-resistant clones and one of the three CB-5083-resistant clones had median Bortezomib Signatures outside the 95% confidence interval of the randomly permuted data. The area under the ROC (AUROC) curve for ixazomib-resistant and CB-5083-resistant clones (0.63 and 0.60, respectively) was lower than those calculated for the training, validation, test, and holdout datasets. In addition, many of the Bortezomib Signatures for well profiles of ixazomib- and CB-5083-resistant clones, particularly those for CB-5083-resistant clones, landed within the 95% confidence interval of the randomly permuted data. These results suggest that the Bortezomib Signature is not a general signature of UPS-targeting drug resistance and instead has some specificity for bortezomib.

      • [Text revision] We added an explicit note that our image analysis pipelines are also publicly available. Our reporting of our data processing pipelines are documented fully and well above standards in our field. Linking the publicly-available resources with these methods maximizes reproducibility.

      Reviewer 1 - Minor Comment 10

      Additional details on the processing steps in the analysis pipeline in the Methods will be highly appreciated.

      We include all image analysis pipelines at https://github.com/broadinstitute/profiling-resistance-mechanisms (G. Way et al. 2023).

      1. [Text revision] We have compared our approach to the on-disease/off-disease scores as introduced in (Heiser et al. 2020). We agree with the reviewer that a discussion of these two methods would help clarify our phenotypic signature concept. The on/off score is about the degree to which a perturbation pushes disease towards a healthy state. In this case we have 3 sets of data: healthy samples (used for training), disease samples (used for training), and the sample we want to score, which should be of the form "disease + perturbation". With our approach, based on singscore, we also have 3 sets of data: sensitive samples (used for training), resistance samples (used for training), and the sample we want to score. Here, our sample we want to score could be anything, not necessarily of the form "resistance + perturbation". Furthermore, singscore does not have the concept of orthogonality to resistance/sensitivity. This would become relevant if we were exploring perturbations or conditions that would induce a resistant cell line to become sensitive, but we are not doing that here. There are other statistical differences (projection vs. rank based etc.) but the key difference is the applicability of the method to the specific problem at hand.

      Reviewer 1 - Minor Comment 7

      How is the Bortezomib Signature related to the "on-disease"/"off-disease" scores described in https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1.full? Are there other alternatives used for similar binary phenotypic signatures? What is the justification for using these measurements? I would love to see this generalized concept explicitly discussed in the Discussion.

      We added the following to the discussion.

      • The Bortezomib Signature is conceptually similar to the on-disease/off-disease score (Heiser et al. 2020). Both require three phenotypic measurements: a target phenotype representing ideal, a disease phenotype, and a new phenotype to classify. However, our approach is technically different (non-parametric compared to linear projection) and our goals are different (phenotypic classification compared to perturbation alignment). Other methods also enable phenotype labeling, but they focus on single-sample annotation without regard to a target phenotype (Wawer et al. 2014; Rohban et al. 2017; Simm et al. 2018; Nyffeler et al. 2020).

      Figure revisions already carried out

      1. [Figure revision] We moved all boxplots from the original Fig. 2-Supplement 9 to the main text (also splitting Fig. 2 into Fig. 2 and 3). From the original Figure 2, we moved the accuracy and average precision bar graphs to the supplement. We also note that this change increases transparency of the discriminative signal of our signature.

      Reviewer 1 - Minor Comment 8

      I would highly recommend showing the Bortezomib Signatures from Figure 2-Supplement 9. in Fig. 2. This was the main measurement used throughout the manuscript and in my opinion, it is very important to consistently visualize the data along the manuscript, for clarity and easier reader interpretation.

      1. [Figure revision] We adjusted the position of the legend in the accuracy and average precision bar graphs (originally Fig. 2 C and D, now Fig. 3-Supplement 2) for clarity. We also note that keeping the bar chart here is standard best practice (compared to a dot plot).

      Reviewer 1 - Minor Comment 4

      I found the visualization in Fig. 2C-D not intuitive (it is properly explained in the legend). I suggest replacing the accuracy colorbar with a color marker to make it more distinct from the random permutation (|--*--|) The location of the text "mean +- SD of 100 random permutation" made me first think that it is linked to the holdout.

      1. [Figure revision] We changed the point distribution in the boxplots (from expanded to standard) to minimize overlap with the boxplot lines. We also updated the legend text to indicate that individual points in boxplots represent the Bortezomib Signature for well profiles. Note, we paste a representative example of this change above (new Figure 3).

      Reviewer 3 - Minor Comment 1

      I found the box plots somewhat difficult to interpret (especially where the WT lines had a lot of overlap with the red shaded area). Do the points in these charts correspond to replicate wells?

      We also update the figure legend.

      • Plots show values for individual well profiles (points), range (error bars), 25th and 75th percentiles (box boundaries), and median.

      • [Figure revision] [Response to Reviewer 2 - Major Comment 7] We thank the reviewer for allowing us an opportunity to clarify the mechanism. We feel that it is beyond scope of this manuscript to disentangle the molecular alterations that cause bortezomib resistance based on our Cell Painting insights. This wet lab experimental process is arduous and cost prohibitive, and we argue that one of the benefits of taking a morphology approach to resistance status is that we can detect resistant cells (and therefore cells that won’t die when presented with a treatment) without knowing the molecular mechanism.

      Nevertheless, the reviewer has encouraged us to enhance the ability for a reader to view and interpret the signature to perhaps more easily facilitate future work. Previously, we presented our signature in text form in Figure 2-Supplement 4 and in heatmap form in Figure 2-Supplement 5. Here, we add a new figure (Figure 2-Supplement 6; pasted below) which will improve interpretability.

      Reviewer 2 - Major Comment 7:

      Next to feature importance, the authors do not discuss (or I missed) what biology the features represent. Such the reader is left wondering what the actual mechanism of bortezomib resistance could be and if cell painting could shed light on the molecular alterations that cause the treatment resistance. While reviewing, I thus wondered which audience the authors targeted with their manuscript. A more focused analysis of their data that highlights aspects of the study either for the machine learning community, the cell biology community, or the precision oncology community would greatly benefit the manuscript's impact. In its current form, the study's findings seem diluted and spread across a wide range of research questions.<br />

      • Figure 2-Supplement 6. Bortezomib Signature visualized by CellProfiler features. Visualization of CellProfiler features contributing to the Bortezomib Signature. Features with high values (mean signature estimates) in resistant cells are purple while features with low values in resistant cells are green. The mean signature estimates were based on Tukey's Honestly Significant Difference test score and the number in each box represents the number of features used to calculate the mean signature estimate.

      Additionally, we add the following to the results section:

      • We then examined the grouping of features across compartments and channels and found radial distribution features were higher in resistant cells (Fig 2-Supplement 6).

      The code change to generate the signature visualization summary is available at: https://github.com/broadinstitute/profiling-resistance-mechanisms/pull/131

      New analyses already carried out

      1. [New analysis] [Response to Reviewer 2 - Major Comment 5] We agree that a systematic analysis of feature selection methods will provide additional insights not already in the manuscript. Therefore, we have performed two new computational experiments to compare our linear modeling feature selection approach against other standard approaches. We demonstrate that our linear modeling approach is effective at isolating the core differences between resistant and sensitive classes.

      Specifically, we performed two analyses: A) UMAP and B) k-means cluster analysis. We analyzed profiles defined by four different feature selection approaches: 1) Using all traditional CellProfiler features; 2) Using the traditional CellProfiler feature selection approach (removing low variance features, high correlating features, etc.); 3) Using 45 random features (same size as Bortezomib Signature); and 4) Using only the bortezomib signature features. We performed Fisher’s exact tests to derive odds ratios of cluster membership by resistance status and calculated Silhouette widths to quantify relative proximity of clusters.

      This analysis generates a new supplementary figure (see below), and demonstrates that the linear-modeling-based feature selection isolated the features driving the differences between the clone types (resistance vs. wildtype) while the standard approaches do not as effectively separate.

      Reviewer 2 - Major Comment 5:

      A fascinating bit of the manuscript is the description of the feature selection from the screen is done systematically, considering the technical and biological variability and technical artifacts and modeling covariates using linear models seems a very appropriate way of doing so and could serve as another proof of concept that this is indeed the most robust way of modeling and removing signal of technical covariates from the data. Yet, I wondered why the authors do not discuss other means of feature selection or dimensionality reduction; further, they need to show how the features cluster the cell lines or why impact (information content) different features deliver. For an audience interested in the technical aspects of cell painting analysis and machine learning based on the data, that would, IMHO, be the most exciting questions.

      • Figure 3-Supplement 3. Benchmarking linear-modeling feature selection to separate clones by bortezomib resistance. Uniform Manifold Approximation and Projection (UMAP) analysis of the qualitative separability of (A) resistance status and (B) Bortezomib Signature scores across four different feature spaces. (C) k-means clustering from k=2 to k=14 of average odds ratio, maximum odds ratio (Fisher’s exact test), and Silhouette width using Bortezomib Signature features.

      Additionally, we add the following to the results section:

      • We then compared our linear-modeling approach to feature selection against other feature spaces and found that the Bortezomib Signature clusters same-type clones (bortezomib-resistant vs. bortezomib-sensitive) with higher enrichment compared to the full feature space, standard feature selection (see Methods), or a random selection of 45 features (Fig 3-Supplement 3).

      And methods section, describing this analysis:

      • We were also interested in comparing the ability of different feature spaces to cluster clones of the same type (resistant vs. sensitive). This analysis would determine if the Bortezomib Signature features, which we derived using linear modeling to isolate biological from technical variables, had a greater ability to cluster. We compared the Bortezomib Signature against three other feature spaces: 1) the full feature space, 2) standard feature selection (see Image data processing methods), and 3) 45 randomly selected features. We performed two analyses using these four feature spaces including Uniform Manifold Approximation and Projection (UMAP) (McInnes et al. 2018) and k-means clustering. For UMAP, we used default umap-learn parameters to identify two UMAP coordinates per feature space. We then visualized the clusters by their resistance status and Bortezomib Signature score. The UMAP analysis represents a qualitative analysis. Next, we applied k-means clustering with 25 initializations across a range of 2-14 clusters (k). Prior to clustering and for each feature space, we applied principal component analysis (PCA) and transformed each feature space into 30 principal components. This step was necessary to compare k-means clustering metrics, which are sensitive to the feature space dimensionality. We applied a Fisher’s exact test to each cluster using a two-by-two contingency matrix that specified cluster membership for each clone classification (resistant vs. sensitive). We visualized the mean odds ratio and max cluster odds ratio for each feature space across k. A high odds ratio tells us that the feature space effectively clusters clones of the same resistance status. Lastly, we calculated Silhouette width (the average proximity between samples in one cluster to the second nearest cluster) for each feature space across k.

      The code change to derive the UMAP coordinates, perform clustering, and generate the figure is available at https://github.com/broadinstitute/profiling-resistance-mechanisms/pull/132

      1. [New analysis] [Response to Reviewer 3 - Major Comment 1] We thank the reviewer for this suggestion, which allowed us to explore the misclassified samples in more depth. We added a new supplementary figure in which we summarized all bortezomib clones (wildtype and resistant) in their accuracy based on the bortezomib signature (panel A). We did not include training set samples in this analysis. Using samples that were consistently incorrectly classified with high confidence (three samples: WT15, BZ06, WT10) we performed two separate two-sample Kolmogorov–Smirnov (KS) tests. Specifically, we compared high incorrect wildtype to high correct wildtype and high incorrect resistant to high correct resistant. Our results indicate that most bortezomib signatures were significantly different between correct and incorrect assignments (panel B), and that the signature features varied between resistant and wildtype misclassification tests (panel C).

      Reviewer 3 - Major Comment 1:

      While the claims are largely substantiated, there are a few points where further consideration would improve the manuscript. Several cell lines were mis-classified with what appears to be a high degree of certainty. Can the authors tell what was driving those predictions? Was there something in the morphological signature that weighed more heavily in those cases?

      • Figure 5-Supplement 1. Examining the accuracy of clone classification and misclassification of clones. (A) Proportion of high-confidence correct, low-confidence correct, low-confidence incorrect, and high-confidence incorrect predictions of well profiles across clones in the test, holdout, and validation sets. High-confidence predictions (high) had a Bortezomib Signatures greater (resistant clones) or less than (sensitive) the 95% confidence interval of randomly permuted data while low-confidence predictions (low) had Bortezomib Signatures within the 95% confidence interval of randomly permuted data. (B) Visualization of Kolmogorov-Smirnov (KS) test statistic means of feature groups across channels and cellular compartments. (C) Plot of the KS test statistic means for feature groups in bortezomib-resistant vs. -sensitive cells. Each feature group is color coded by the imaging channel.

      Additionally, we add the following to the results section:

      • While the Bortezomib Signature correctly characterized the bortezomib sensitivity of most clones, it consistently misclassified others (WT10, WT15, and BZ06) (Fig 5-Supplement 1 A). Proliferation assays conducted in earlier experiments showed that WT10 and WT15 were sensitive to bortezomib while BZ06 was resistant (Fig. 1-Supplement 2 A and B). By comparing these incorrect predictions with high-confidence correct predictions, we observed differences that varied by clone type, suggesting unique morphology may be driving each of these misclassifications (Fig. 5-Supplement 1 B and C). These results are consistent with the Bortezomib Signature being generalizable to clones not included in the training dataset and suggest that morphological profiling has the potential to identify bortezomib-resistant clones based on the morphological features of cells in the absence of drug treatment.

      And methods section, describing this analysis:

      Some profiles were consistently predicted incorrectly with high confidence but in the opposite direction (see Figure 5-Supplement 1). For a well-level profile to be categorized as high-confidence (in either the correct or incorrect directions), it needed to score beyond the 95% confidence interval of the randomly permuted data range. For example, a high-confidence incorrect resistant profile would have a Bortezomib Signature below 95% confidence interval of the randomly permuted data. To evaluate the features driving the differences in these samples, we applied two-sample Kolmogorov–Smirnov (KS) tests per Bortezomib Signature feature. We applied these tests to two separate groups: 1) misclassified bortezomib-sensitive vs. high-confidence accurate bortezomib-sensitive and 2) misclassified bortezomib-resistant vs. high-confidence accurate bortezomib-resistant.

      The code change to generate the UMAP coordinates and figure is available at https://github.com/broadinstitute/profiling-resistance-mechanisms/pull/130

      Description of analyses that authors prefer not to carry out

      1. [Response to Reviewer 2 - Minor Comments 1 and 2]: These are interesting suggestions! Still, we prefer not to speculate on the biological mechanism of the Bortezomib signature. Connecting morphological features identified as contributing to the Bortezomib Signature by Cell Painting to specific biological pathways would demand considerable cell-based assays to validate. In addition, our analyses suggest that the features contributing to the Bortezomib Signature are spread across a range of cellular compartments and channels, making it difficult to pin down specific mechanisms or pathways as likely contributors to bortezomib resistance. However, we are adding a figure to increase interpretability of the signature, which will aid in developing future hypotheses. Note that the signature was not possible to detect by eye (Fig. 2 A).

      Reviewer 2 - Minor Comment 1:

      There could be some speculation on the mechanism of Bortezomib resistance concerning the literature with the existing image data. For example, Bortezomib resistance is connected to serine synthesis and how a particular feature could contribute to the known mechanism.<br />

      Reviewer 2 - Minor Comment 2:

      Along the same lines, the authors could show that larger cells lead to resistance with microscopic images.

      2. [Response to Reviewer 2 - Major Comment 8]: We appreciate the reviewer’s concern that our work using HCT116 clonal cells lines may not directly reflect results from patient samples. Our choice was based on previously published work demonstrating the efficiency with which HCT116 cells generate resistant clones due to diminished DNA mismatch repair and decreased expression of drug efflux pumps. Since our work is a proof of concept rather than a comprehensive demonstration of translating morphological profiling into clinical practice, we believe that experiments using multiple patient cell lines from different tissues as well as digital pathology records to be beyond the scope of this work. We instead chose to tone down the language of our manuscript to more clearly acknowledge the limitations of our work and clarify this as a proof of concept.

      Reviewer 2 - Major Comment 8 (relevant excerpt):

      I suggest the authors test their approach on at least two other cell lines (maybe from different tissues) and benchmark their results against a dataset of digital pathology where such predictions are made from stained and analyzed tissue slices. This way, after a thorough benchmark against related third-party data sets, the method would significantly gain relevance, the paper would appeal to a broader audience, and the advance gains more merit.<br />

      3. [Response to Reviewer 3 - Major Comment 2]: The bortezomib sensitivity of ixazomib- and CB-5083-resistant clones was not determined, and hence can not be ruled out as a possible explanation for their high Bortezomib Signature scores. However, we prefer not to conduct additional proliferation assays for the misclassified clones (IX02, WT06, CB14, CB16) in the presence of bortezomib to determine whether coincidental bortezomib resistance might explain the signature performance. Our rationale is that three other misclassified clones (WT10, WT15, and BZ06) had the expected bortezomib sensitivity in proliferation assays (Fig. 1-Supplement 2), meaning that additional proliferation assays may not reveal any insights regarding the signature performance.

      Reviewer 3 - Major Comment 2:

      Was the bortezomib sensitivity of the IX (or CB) resistant cell lines determined? If there were differences, this could explain some of the variation in the morphological signatures. This could be easily done in one or two growth experiments.

      4. [Response to Reviewer 2 - Major Comment 7]: Thank you for pointing this out. Our goal is to keep the study multi-disciplinary. We are adding a figure to increase interpretability of the signature, and adding text-based clarifications.

      Reviewer 2 - Major Comment 7 (relevant excerpt):

      While reviewing, I thus wondered which audience the authors targeted with their manuscript. A more focused analysis of their data that highlights aspects of the study either for the machine learning community, the cell biology community, or the precision oncology community would greatly benefit the manuscript's impact. In its current form, the study's findings seem diluted and spread across a wide range of research questions.<br />

      5. [Response to Reviewer 2 and 3 - Major Comments 6 and 4]: We prefer not to expand the scope of the model to predict other drug signatures. This would require a substantial amount of work to generate the appropriate drug-resistant clones, collect the imaging data, and analyze it, and we think it important to convey the purpose of our paper is proof of concept. We do not feel that the time invested in performing this analysis would result in adequate returns beyond what we already demonstrate.

      Reviewer 2 - Major Comment 6.

      Interestingly, the Bortezomib signature is specific to the drug and not a broad range of proteasomal inhibitors. However, seeing the common features between all the proteasomal inhibitors would be interesting.

      Reviewer 3 - Major Comment 4

      There was some predictive ability of the Bortezomib Signature for ixazomib resistance. Were there some features that were correlated with IX-resistance, i.e. UPS pathway, versus specific to bortezomib? Do the features suggest anything about resistance mechanisms or is the feature set too abstruse to interpret?

      References

      Foroutan, Momeneh, Dharmesh D. Bhuva, Ruqian Lyu, Kristy Horan, Joseph Cursons, and Melissa J. Davis. 2018. “Single Sample Scoring of Molecular Phenotypes.” BMC Bioinformatics 19 (1): 404.

      Heiser, Katie, Peter F. McLean, Chadwick T. Davis, Ben Fogelson, Hannah B. Gordon, Pamela Jacobson, Brett Hurst, et al. 2020. “Identification of Potential Treatments for COVID-19 through Artificial Intelligence-Enabled Phenomic Analysis of Human Cells Infected with SARS-CoV-2.” bioRxiv. https://doi.org/10.1101/2020.04.21.054387.

      McInnes, Leland, John Healy, Nathaniel Saul, and Lukas Großberger. 2018. “UMAP: Uniform Manifold Approximation and Projection.” Journal of Open Source Software 3 (29): 861.

      Nyffeler, Johanna, Clinton Willis, Ryan Lougee, Ann Richard, Katie Paul-Friedman, and Joshua A. Harrill. 2020. “Bioactivity Screening of Environmental Chemicals Using Imaging-Based High-Throughput Phenotypic Profiling.” Toxicology and Applied Pharmacology 389 (January): 114876.

      Rohban, Mohammad Hossein, Shantanu Singh, Xiaoyun Wu, Julia B. Berthet, Mark-Anthony Bray, Yashaswi Shrestha, Xaralabos Varelas, Jesse S. Boehm, and Anne E. Carpenter. 2017. “Systematic Morphological Profiling of Human Gene and Allele Function via Cell Painting.” eLife 6 (March). https://doi.org/10.7554/eLife.24060.

      Simm, Jaak, Günter Klambauer, Adam Arany, Marvin Steijaert, Jörg Kurt Wegner, Emmanuel Gustin, Vladimir Chupakhin, et al. 2018. “Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery.” Cell Chemical Biology 25 (5): 611–18.e3.

      Wacker, Sarah A., Benjamin R. Houghtaling, Olivier Elemento, and Tarun M. Kapoor. 2012. “Using Transcriptome Sequencing to Identify Mechanisms of Drug Action and Resistance.” Nature Chemical Biology 8 (3): 235–37.

      Wawer, Mathias J., Kejie Li, Sigrun M. Gustafsdottir, Vebjorn Ljosa, Nicole E. Bodycombe, Melissa A. Marton, Katherine L. Sokolnicki, et al. 2014. “Toward Performance-Diverse Small-Molecule Libraries for Cell-Based Phenotypic Screening Using Multiplexed High-Dimensional Profiling.” Proceedings of the National Academy of Sciences of the United States of America 111 (30): 10911–16.

      Way, Gregory, Yu Han, David Stirling, and Shantanu Singh. 2023. Broadinstitute/profiling-Resistance-Mechanisms: Analysis for Preprint. Zenodo. https://doi.org/10.5281/ZENODO.7803787.

      Way, Gregory P., Maria Kost-Alimova, Tsukasa Shibue, William F. Harrington, Stanley Gill, Federica Piccioni, Tim Becker, et al. 2021. “Predicting Cell Health Phenotypes Using Image-Based Morphology Profiling.” Molecular Biology of the Cell 32 (9): 995–1005.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors use Cell Painting, a high-content image-based phenotypic assay, to distinguish between clonal cancer cell lines that are resistant versus sensitive to a proteasome inhibitor anti-myeloma drug called bortezomib. The authors characterized a high-dimensional cell morphology signature for bortezomib-resistance, evaluated it on an independent subset of cell lines, and evaluated specificity in respect to other drugs targeting the ubiquitin-proteasome system. The authors thus propose image-based morphology characterization as an alternative method for characterizing drug resistance.

      Strengths: solid methodology - cell lines validation of drug resistance, extensive data collection, thorough validation of the analysis pipeline, avoiding potential confounders, biases and proper data partitioning to test and hold-out (what the authors refer to as "machine learning best practices").

      Weakness: weak discriminative signal. Some aspects of the writing could be improved to make the manuscript easier to follow (see Minor comments).

      Major comments:

      While I am convinced that the signature captures morphological phenotypes associated with drug resistance, at the cumulative scale, the discriminative signal of a single cell type seems weak. Specifically, it is not clear whether the signature can effectively capture the drug resistance of a single cell line. In Figure 2-Supplement 9, considering the test (C) and the holdout (D), only 1/9 BZ clones' median signatures were beyond the 95% confidence interval, with 4/6 and 2/6 WT cell types with median signatures beyond the positive and negative 95% confidence interval correspondingly. When defining bortezomib-sensitivity according to the median signatures' sign (>0 or <0) of a cell line, Figure 2-Supplement 9 shows that in the test+holdout there are 9/9 correct bortezomib-resistance (BZ) and 6/7 correct bortezomib-sensitive (WT) predictions. However, similar discrimination levels also appeared in the other drugs (ixazomib, CB-5083), making the statements about specificity less grounded. When the authors evaluate the AUROC they report ~0.6 (line #194) for the non-specific (ixazomib, CB-5083) drugs versus ~0.75 for bortezomib-resistance (line #202). With Fig. 4, the data fully supports the argument that the bortezomib-signature encodes bortezomib-resistance, but the signal is weak. Thus statements such as "We found the Bortezomib Signature could predict whether a cell line was bortezomib-resistant or bortezomib-sensitive" (line #172) and the specificity statements in the abstract" (line #28) are not supported by the data in my opinion. I would recommend the authors to tune down these and other related statements throughout the manuscript. An alternative would be to increase the number of wells and see whether this weak signal can indeed be statistically amplified with many replicates to make a robust and specific characterization of a cell line's bortezomib-sensitivity (but I assume this is a lot of work and probably out of scope of this manuscript). I think it is also important to discuss in more detail the interpretation of these results (including Figure 2-Supplement 9), in this context, in the Discussion.

      Minor comments:

      Suggested clarifications (some might be less relevant if the manuscript is designed for experts in the more clinical domain who are familiar with these terms / style):

      1. More information in Fig. 1's legend would be helpful to follow the experimental design. I found it hard to follow in its current form and had to go back to carefully reading the main text to fully understand.
      2. What is LD90 (line #87)? LD50 (line #97)?
      3. It was not clear to me in the text which and how many cell lines were evaluated and the reader is forced to go to the SI. For example, "(BZ01-10 and BZ clones A and E)" (line #96-97) and "wild-type clones (WT01-05, 10, and 12-15)" (line #98) appeared when presenting the results without a clear explanation and made it harder for me to follow. Summary of the data (for example, based on Figure 2-Supplement 8) can be briefly mentioned in the text to make it more clear for the reader.
      4. I found the visualization in Fig. 2C-D not intuitive (it is properly explained in the legend). I suggest replacing the accuracy colorbar with a color marker to make it more distinct from the random permutation (|--*--|) The location of the text "mean +- SD of 100 random permutation" made me first think that it is linked to the holdout.
      5. Lines #104-111 were duplicated in lines #114-122.
      6. The "Bortezomib Signature" is a critical measurement but is only briefly mentioned in lines 150-151 ("..based on the direction-sensitive ranking method for phenotype analysis, singscore (Foroutan et al., 2018)"). Please provide more information/intuition.
      7. How is the Bortezomib Signature related to the "on-disease"/"off-disease" scores described in https://www.biorxiv.org/content/10.1101/2020.04.21.054387v1.full? Are there other alternatives used for similar binary phenotypic signatures? What is the justification for using these measurements? I would love to see this generalized concept explicitly discussed in the Discussion.
      8. I would highly recommend showing the Bortezomib Signatures from Figure 2-Supplement 9. in Fig. 2. This was the main measurement used throughout the manuscript and in my opinion, it is very important to consistently visualize the data along the manuscript, for clarity and easier reader interpretation.
      9. Line #155-156: "We found that the pattern of Bortezomib Signatures corresponded to the cell identity plate layout", the word "not" is missing before "corresponded".
      10. Additional details on the processing steps in the analysis pipeline in the Methods will be highly appreciated.

      Referees cross-commenting

      My main critic is regarding "over selling" a weak discriminative signal. Specifically, I am not convinced that the major claims regarding predicting sensitivity and specificity at the single cell types scales are supported by the data. Since reviewer #2 and #3 did not raise this concern I think it is worth discussion here.

      Once these statements are tuned down - I think no significant additional work is needed to make the point that they can measure a discriminative signal. If they want to make these claims, perhaps they'd like to collect more data to gain statistical power (but I am not optimistic this will work at the single cell level).

      Personally, I was happy with the authors' choice of cell lines not included in the training dataset. I am not convinced that additional cell lines + validations are necessary for making the point of a proof of principle.

      Significance

      Cell Painting was applied to many applications, but as far as I am aware this is the first attempt for an image-based phenotypic characterization of drug resistance. While the authors established that this approach can measure, to some extent, bortezomib-sensitivity, at the current state of the results, I am not convinced that cell painting can be practically used to assess bortezomib-sensitivity of a single cell line. However, this does not imply that the same approach can not achieve the goal, perhaps by using other cell painting markers for bortezomib-sensitivity, or with the same markers to assess sensitivity of different drugs. The cell painting + analysis approaches are not new and the clinical impact is questionable, but the technical aspects (data, analysis) are exceptional and the concept may hold as I described above.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary: Forer and Otsuka provide first-rate evidence for tethers fixed in place between separating anaphase chromosomes using electron tomography. The authors traced the anaphase movement of a number of living cells before fixation for examination using electron tomography. The manuscript is clearly written and provides an excellent introduction and discussion of the known literature. The reader will have an excellent background to see the importance of this work.

      Major comments:<br /> - Are the claims and the conclusions supported by the data or do they require additional experiments or analyses to support them?

      No further experiments are needed. The data are very supportive, and extremely clear.<br /> - Are the data and the methods presented in such a way that they can be reproduced? Yes.<br /> - Are the experiments adequately replicated and statistical analysis adequate? Yes.

      Minor comments:<br /> - Are prior studies referenced appropriately? Yes.<br /> - Are the text and figures clear and accurate? Absulotely.<br /> - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      The authors are to congratulated on their major contribution to this study on tethers between separated daughter chromosomes. It is a tpur deforce to go from the living cells to fixing and identifying the same separated chromosomes using electron tomography to see the ultrastructure of the fibers seen fir.

      Referees cross-commenting<br /> Thank you reviewer #2. The manuscript should be published. It is an excellent contribution.

      We thank the reviewer for the appreciation of the clarity and quality of our work.

      Reviewer #1 (Significance):

      Provide contextual information to readers (editors and researchers) about the novelty of the study, its value for the field and the communities that might be interested.

      This manuscript is the first to use electron tomography to identify the tethers between separated anaphase chromosomes. Forer and the laetMichael Berns and their co-authors have published a number of papers using phase microscopy and lasers to report on the physical nature and elastic properties of these fibres in the past. Forer and Otsuka have presented first-rate evidence for the reality of these structures using electron tomography. This manuscript should highlighted in the published journal.<br /> The chemical identity of these fibers as the authors state is unclear.

      The following aspects are important:

      • Audience: describe the type of audience ("specialized", "broad", "basic research", "translational/clinical", etc...) that will be interested or influenced by this research; how will this research be used by others; will it be of interest beyond the specific field?

      This exciting contribution will be read by anyone interested in mitosis. It will be of interest to all Cell Biologists because of the careful manner in which the living cells were studied before they were fixed for examination using electron tomography. The readers will be dreaming how they can use this process on their Cell Biology problems._

      • Please define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      I am a cell Biologist who has made contributions, both in light microscopy and in transmission microscopy on diving cells, both in tissue culture and in situ in aviav and zebrafish embryos.

      We thank the reviewer for appreciating the significance of our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this paper, the authors use light microscopy and electron tomography to study anaphase chromosomes in crane fly spermatocytes. They find that there are two "tether" structures that connect telomeres of sister chromatids. One tether is thicker (denser) and extends between sister chromatids during early but not late anaphase, whereas a second, less-dense tether maintains contact with both sister chromatids in all examined stages of anaphase. The paper makes arguments as to what the tethers could or could not be. Specifically, they are too numerous to be ultrafine DNA bridges seen in various normal or abnormal segregation events and they also do not affect anaphase chromosome motion the same way ultrafine DNA bridges do.

      Major comments:<br /> The major claim that there are tethers that connect sister chromatids in anaphase is supported by the data. Moreover, the data resolves two types of tethers on the basis of their density. While it is unclear what the composition of the tethers are, the paper makes a convincing case that they cannot be the DNA ultrafine bridges seen in other studies. The discussion has sufficient caveats that most readers will see that more work is needed to identify the composition of the two tethers. In my opinion, no further experiments are needed to support the modest claims of this paper. Therefore, I only have minor comments that may hopefully improve the paper's clarity.

      We thank the reviewer for the positive evaluation of our work.

      Minor comments:<br /> It was argued that the tethers reported here were also seen in other species and cellular contexts, where the imaging work was done with projection EM imaging. Presumably, what is new here is the usage of electron tomography. It would help readers if the authors explained why the electron tomography done here was essential to arrive at key conclusions.

      Thank you for the useful comment. We have added the explanation of why electron tomography was critical to visualise small tether structures to the last paragraph of the Discussion on page 7.

      p.3 mitochondria appeared to be fixed properly ... (e.g., Figs. 1C, 2B) - I don't see any mitochondria in any figures. Perhaps this observation should be noted as "not shown"?

      We thank the reviewer for pointing this out. We have added an electron micrograph of mitochondria to the Supplementary Figure 1.

      p.3 The images shown in Figs. 1, 2, 4 - The figures should be called out in the order; in this case, Fig 3 has not been called out yet.

      We have corrected the order of the figures.

      p.4 we did not find any other connecting structures - Because the sample was processed by traditional EM methods, it's safer to add a caveat that other connecting structures could be missed if they were disrupted by sample prep or if they did not pick up stain as well as the two structures presented in this paper.

      We have clarified that our sample was chemically fixed in the first paragraph of the Discussion on page 4. Because the details of how our samples were prepared are described in the Method section, we did not add further details to this paragraph.

      p.7 we expect such structures to be commonly seen in other cell types as well if they are examined carefully - Instead of saying that examinations should be done "carefully", it would be more helpful to specify how other cell types should be examined. This work shows that the bridges can be found if the cells are either sectioned parallel to the spindle axis or if a sufficiently large volume is sampled.

      We have now clarified that 3D electron microscopy techniques such as electron tomography are critical to visualise small tether structures in the last paragraph of the Discussion on page 7.

      Please use consistent spelling/hyphenation of ultrafine/ultra-fine and word choice (strands vs. bridges).

      Referees cross-commenting<br /> I agree with my co-reviewers's comments and have no further suggestions._

      Reviewer #2 (Significance):

      This may be the first use of electron tomography to study the structural details of tethers that connect chromosomes in anaphase cells. The data is of sufficient quality to reveal differences in density. Namely, one class of tether appears to be an extension of the chromosome while the other class is composed of thin filaments. This study is novel in that it characterizes a mitosis-associated complex that is poorly studied compared to the microtubule-based spindle apparatus and the kinetochore. Hopefully, the tethers will draw more attention and further characterization by methods like super-resolution microscopy and cryo-electron microscopy. My expertise is in chromatin, mitotic machines, and cryo-electron tomography.

      We thank the reviewer for appreciating the novelty and the impact of our work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      Tethers between telomeres of chromosomes in anaphase were inferred from earlier studies of laser microbeam cutting experiments. The current paper presents images from electron tomography of crane fly spermatocytes that substantiates the earlier inference. The authors deduce that the darker filaments and the lighter filaments that they visualize may be the structural tethers at telomeres.

      Major comments:

      The experiments are carefully done, and the conclusions are appropriately worded to qualify any caveats. This short communication is well-presented, and I have only a few comments._

      We thank the reviewer for appreciating the clarity and quality of our work.

      The authors should expand their list of references on bridges to include those listed by Warecki et al (Curr Biol 33:1-17, 2023; refs 15-26, etc).

      We do not think it is necessary to expand the list of references for ultra-fine DNA bridges. In the article we submitted, we discussed the Warecki at al article in the penultimate paragraph of the Discussion; we concluded that the bridges that Warecki at al described are different from ours in having so few per cell that they couldn’t be tethers, and further that there was no evidence that those bridges were elastic. For those reasons, we do not find discussion of those proteins relevant to tethers, any more than would listing all the proteins associated with ultra-fine DNA bridges be relevant to the elastic tethers.

      In the Discussion, we discussed data suggesting that a known elastic protein titin was present; that is as far as we wanted to go on speculation of what the elastic component of tethers might be.

      The authors present arguments that the tethers are not the DNA bridges observed by others. However, they should try to address this experimentally by treatment of their preparations with DNase to see if the thick and/or thin filaments disappear.

      While we agree that it would be important to identify the components of the tethers, we are concerned that those experiments are beyond the scope of this manuscript. Nevertheless, we appreciate the constructive suggestion for the future research direction.

      Moreover, they should discuss in more detail the possible functions of (DNA) bridges, including the recent model from Bill Sullivan's lab (Warecki et al, Curr Biol, 2023) that they help to retain fragments of broken chromosomes. In addition, the authors should summarize the various proteins that may be associated with the bridges (as enumerated in the Warecki et al 2023 paper).

      As we describe above, we concluded that the bridges Warecki at al described are different from the tethers that we report in our manuscript. Therefore, we do not think it is necessary to expand the discussion on the proteins and functions associated with ultra-fine DNA.

      The authors could add a sentence to the Results or Discussion of whether the thicker tethers might become stretched as anaphase progresses to become the thinner tethers (Fig. 4G).

      We thank the reviewer for this suggestion. We actually mentioned this possibility in the third paragraph of our Discussion on page 7.

      The authors may want to add a few sentences to the Discussion about the "chromosomal bouquet" stage of leptotene of meiosis prophase I where the telomeres of chromosomes seem pulled together and associate with the nuclear envelope --- they could speculate if this might also be due to the tethers that they describe in spermatocytes.

      This is a very interesting possibility. While we would refrain from adding this speculation to our manuscript as it is beyond the scope of the main points, it is certainly an interesting avenue of future research.

      Minor comments:

      A few additional comments are as follows:

      p. 2 last sentence of first paragraph -modify the wording about "no structural evidence that identifies physical connections between separating telomeres", since there is some information from genetic and cell biology light microscopy experiments. Perhaps simply change "structural" to "ultrastructural".

      We have changed the wording as the reviewer recommended

      p. 6, 5th line of second paragraph - change "ribosome DNA" to "ribosomal DNA"

      We have corrected it.

      Figure 1D - add the chromosome to the right of the schematic model (as suggested by Fig. 1B).

      We are sorry for the confusion. In Figure 1D, the left half of the tethers are 3D modelled and shown. We have clarified this point by modifying the legend of Figure 1D

      p. 17 (Methods), line 10 of first paragraph - state if this is light or heavy Halocarbon oil (give details).

      It is a mixture of heavy and light Halocarbon oil. We have clarified it on page 17.

      p. 17 (Methods), line 12 of first paragraph- state the concentration for fibrinogen and for thrombin.

      As we wrote in the original manuscript, the procedures are described in detail in our previous publication (Forer A. & Pickett-Heaps J. (2005) Fibrin clots keep non-adhering living cells in place on glass for perfusion or fixation. Cell Biology International 29: 721–730). Nonetheless, to clarify this point, we have modified the text on page 17.

      p. 17 (Methods), line 4 of second paragraph - is there any data to show that the filaments (tethers) occur if there is no cold shock?

      Yes, we do see similar filamentous structures in the sample without cold shock. For your information, we show one of the electron micrographs below. In our manuscript, we show the data from the samples prepared with cold shock, because it better visualizes the filamentous structures. We now show these electron micrographs in the Supplementary Figure 2.

      Referees cross-commenting<br /> I concur with Reviewers #1 and #2 that this is a fine paper that should be published. My detailed comments submitted with my review are simply meant as revisions to further strengthen this paper.

      We thank the reviewer for supporting the publication of our manuscript.

      Reviewer #3 (Significance):

      Strengths: This is an important conceptual advance and the carefully done ultrastructural imaging provides the foundation for future studies that could delve into the molecular composition and functional significance of the tethers at telomeres of anaphase chromosomes seen here by 3D electron microscopy.

      Limitations: the molecular composition and functional roles are not yet known for the tethers seen here by 3D electron microscopy, but to do so would involve an entire new program of experimentation.

      Advances: there have only been two earlier ultrastructural papers on tethers at telomeres, and the tethers were peripheral to the main focus of those papers. Thus, the current paper extends our ultrastructural information about tethers.

      Audience: this work is of importance for scientists who study the mechanics of chromosome movement on spindles, including regulation to combat aneuploidy. This work will also be important for a broader audience to inform them about transmission of the hereditary information to daughter cells._

      We thank the reviewer for appreciating the significance and the impact of our work.

    1. While there are rich areas of study in animal communication and interspecies communication, our focus in this book is on human communication. Even though all animals communicate, as human beings we have a special capacity to use symbols to communicate about things outside our immediate temporal and spatial reality (Dance & Larson, 1976). For example, we have the capacity to use abstract symbols, like the word education, to discuss a concept that encapsulates many aspects of teaching and learning. We can also reflect on the past and imagine our future. The ability to think outside our immediate reality is what allows us to create elaborate belief systems, art, philosophy, and academic theories. It’s true that you can teach a gorilla to sign words like food and baby, but its ability to use symbols doesn’t extend to the same level of abstraction as ours. However, humans haven’t always had the sophisticated communication systems that we do today.

      With 126 published definitions of "communication," touching on other forms of communication other than merely speaking in a speech class is vital. With humans having some of the widest range of speech (i.e. various languages) that often times are not seamless, other universal abstract symbolism in conjunction with spoken communication is necessary to bridge the gap. Even our written language and assigned meaning to certain methodic squiggles displayed on paper varies widely, as well as other less obvious ways of communicating like gestures and body languages that could seem inconsequential to one may be monumentally offensive to others, the intricate woven methods to communicate within the complexities we as a human species have created is a fascinating study beyond merely standing in front of a group of peers and talking at them for 3-5 minutes about a chosen topic.

    2. Like other forms of communication, intrapersonal communication is triggered by some internal or external stimulus. We may, for example, communicate with our self about what we want to eat due to the internal stimulus of hunger, or we may react intrapersonally to an event we witness. Unlike other forms of communication, intrapersonal communication takes place only inside our heads.

      Everyone on this planet has intrapersonal communication. I talk to myself every day, and I have conversations with myself on what I'm going to do or what I need to do. Some people talk to themselves to calm down, or they journal to ease their minds. When something surprising happens people usually react somehow in their head, basically when anything happens people react to themselves. Just as the text states, "We also use intrapersonal communication or “self-talk” to let off steam, process emotions, think through something, or rehearse what we plan to say or do in the future." Intrapersonal communication happens almost every second throughout one person's day.

    1. In fact, it might be good if you make your first cards messy and unimportant, just to make sure you don’t feel like everything has to be nicely organized and highly significant.

      Making things messy from the start as advice for getting started.

      I've seen this before in other settings, particularly in starting new notebooks. Some have suggested scrawling on the first page to get over the idea of perfection in a virgin notebook. I also think I've seen Ton Ziijlstra mention that his dad would ding every new car to get over the new feeling and fear of damaging it. Get the damage out of the way so you can just move on.

      The fact that a notebook is damaged, messy, or used for the smallest things may be one of the benefits of a wastebook. It averts the internal need some may find for perfection in their nice notebooks or work materials.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      The manuscript by Rigger and Brenner details the role of vimentin network, in advancing OA pathogenesis by exacerbating premature senescence. The data is well presented and the study of interest, in that there is little known about vimentin in cartilage biology.<br /> The authors used OA derived cartilage explants and chondrocytes cultures, were graded for severity and compared accordingly. Figure 1 shows that markers of senescence are increased with structural damage, which is well established and consistant with the literature. Using a DOX model the authors induce premature senescence and exhibit a disrupted vimentin network. However, upon KD of CDKN2A, a marker of senescence, but did not observe complete reversal of CSV presentation.<br /> Next the authors show in figure 4 and 5, that the reduction or dismemberment of vimentin structures are linked to senescence and may act as contributing factors.<br /> Figures 6 and 7 then go on to show that upon advanced passage chondrocytes lose their vimentin network, and tend to senesce and mineralize.

      Reviewer #1 (Significance):

      Strength:<br /> This is a very novel study showing a link between vimentin and senescence in chondrocytes. The data are in line with other data. The work is clearly written structured and well displayed.

      Author´s response:<br /> We thank reviewer #1 for their interest in our work and their overall positive report.

      Suggestions for improvement:

      While the study is very thorough ought in describing the markers of senescence and vimentin network, it lacks insight regarding mechanism which isn't completely deciphered. Are there links to key transcription factors?

      Author´s response:<br /> The transcriptional regulation of vimentin in human cells is very complex. The VIM promoter region comprises multiple elements, such as a NF-kB- binding site, a PEA3-binding site and two AP1-binding sites (Zhang et al., 2003). Moreover, it was recently demonstrated that redox signaling is involved in vimentin expression at the wound margin after tissue injury in zebra fish (LeBert et al., 2018). However, it has also been reported that IL-1ß stimulation results in reduced gene expression of vimentin via p38-signalling in cartilage degeneration and OA progression (see manuscript REF. 36,37).

      In our study, we observed that enhanced CSV levels are associated with a decreased vimentin gene expression, indicating a lower stability of the mRNA or decreased transcription of VIM in senescent chondrocytes (maybe due to enhanced p38-signalling as mentioned above). Since the transcriptome in senescent cells is radically changed, this question cannot be answered easily.

      In future studies, we will rather try to clarify the underlying mechanism of vimentin externalization. There are still many questions to be answered: is the CSV anchored in the cell membrane (which anchor protein?) and is there still a connection to the intracellular vimentin network? Which proteins are involved in the externalization process: maybe comparable to phosphatidylserine exposure, mediated by flippases, scramblases, and lipid transfer proteins or rather by vesicles?

      Literature mentioned above (not included in manuscript):

      LeBert et al., 2018: Damage-induced reactive oxygen species regulate vimentin and dynamic collagen-based projections to mediate wound repair. DOI: 10.7554/eLife.30703

      Zhang et al., 2003: ZBP-89 represses vimentin gene transcription by interacting with the transcriptional activator, Sp1. DOI: 10.1093/nar/gkg380

      It is also unclear if disruption of the network is more detrimental than KD in promoting senescence.

      Author´s response:<br /> KD of Vimentin led to a gradually decrease of intracellular Vimentin content and consequent stress. The cells were analyzed 7 days after induction of the KD and exhibited a stable senescent phenotype, comparable to Doxorubicin-treated chondrocytes (treated with very low concentrations over several days to produce only mild but ongoing stress). These models might reflect the pathophysiologic situation: We think that cellular stress due to mechanical impact and subsequent oxidative stress/ low-grade inflammation might lead to a gradual disruption or re-organization of the vimentin network, which is accompanied by decreased vimentin gene expression.

      In case of the disruption of the vimentin network by Simvastatin, the stress response was very intense and rapid (24 h), and was only conducted as a proof-of-principle experiment. Despite the upregulation of some senescence-associated markers, we don`t think that permanent Simvastatin treatment would be suitable to obtain a stable senescent phenotype, but rather expect the cells to die due to excessive stress.

      It would have been good to include models OA murine models to understand these processes better, and make a stronger physiological connection with OA of the joint.

      Author´s response:<br /> The CSV antibody is only suitable for human cells and cannot be used for immunohistochemistry. Therefore, all previous reports of CSV are based on human (isolated) cells. At the current time point, it would not be possible to stain CSV in joints of mice after induction of PTOA due to the methodological limitations. We actually tested the CSV-antibody in isolated lapine chondrocytes and found a high percentage of CSV-positive cells, even at low passages. Although stress increased the amount of CSV-positive lapine cells, we did not consider the results as reliable due to the high percentage in un-stressed cells, which might result from unspecific antibody binding.

      Overall, we think that the usage of clinical OA samples is convincing and reflect the pathophysiologic situation in the human OA joint.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript provides solid evidence for an association between cell surface vimentin (CSV) and chondrocyte senescence. Human cartilage and cultured chondrocytes are used with a wide range of approaches to provoke senescence: natural osteoarthritis, traumatic loading ex vivo, doxorubicin to cells in monolayer, vimentin siRNA, and simvastatin. In contrast, relatively little was done to try and interrupt or reverse the role of CSV in senescence, with CDKN2A siRNA representing one attempted intervention. The manuscript is well written and the data are presented in a logical and clear manner, with a high likelihood of being reproduced in subsequent studies.

      Author´s response:<br /> We thank reviewer #2 for their interest in our work and their mainly positive report.<br /> Regarding their comment on our attempts to reverse CSV on senescent chondrocytes, we would like to add the following: Reversal of cellular senescence is a very ambitious challenge. But in fact, we are currently preparing a manuscript in which we characterize an appropriate senolytic strategy to “rejuvenate” human chondrocytes and plan to use this approach to reduce the amount of senescent and thus CSV-positive cells in future experiments.

      _Major comments:

      In the doxorubicin experiments, the senescent cells show a spread morphology as expected. Given the importance of vimentin in cell spreading (as the authors own data show), the possibility that spread morphology itself (and not senescence) leads to CSV should probably be examined. This could perhaps be achieved by plating with different concentrations of fibronectin or other matrix proteins that produce a spread morphology to a degree that matches the doxo. If the cells remain spread for ~10 days but don't become senescent and don't have CSV, this would provide further support for a direct relationship.

      Author´s response:<br /> We agree that cell spreading is associated with various cellular processes (for example by the YAP signaling pathway). Moreover, we would like to thank the reviewer for the proposed experiment.

      Seeding of cartilage cells on fibronectin coated plates is a commonly used procedure to isolate chondrogenic stem progenitor cells, due to their higher affinity to fibronectin. The cells are usually cultured for several days on the coated plates and do not exhibit a flattened, senescent-like phenotype (as we observe for Doxorubicin-treated cells), but an elongated, fibroblast-/ stem cell-like shape. Our results (Figure 6E) demonstrate that CSPC have no increased CSV levels, despite their elongated (not flat) morphology.

      There are some findings supporting the assumption that CSV leads to enhanced cell adhesion, but not that adhesion or cell spreading promotes CSV: we included experiments with HeLa (low CSV levels) and SaOS-2 (high CSV levels), which demonstrated that high CSV levels are associated with increased plastic adhesion (Figure S5). In line with this, we demonstrated that higher CSV levels on chondrocytes were associated with enhanced fibronectin and vitronectin binding, which might explain increased plastic adhesion. Moreover, Simvastatin stimulation and subsequent cellular stress by Vimentin disruption resulted in enhanced CSV but did not lead to cell spreading (Actin not affected, cells rather elongated, not flattened).

      Minor comments:

      The CSV antibody and staining method appeared to have generated some signal from debris, which makes it challenging to assess the localization of true staining. Presumably the true staining would be present only on the cell surface. While the widefiled view is appreciated, perhaps insets with a higher magnification would clarify.

      Author´s response:<br /> In Figure 2h and Figure 2i, we provide insets of the IF-staining and an exemplary image made by scanning electron microscopy (SEM). CSV is not localized on debris – Figure 2h, actually represents the cell surface. The magnified, Doxo-treated cell is highly senescent and thus flattened. The uneven (rather spotted) staining pattern of CSV and the unusual shape of the cell might suggest that this is debris, not the cell membrane.

      For figure 1k, it is a bit surprising that CDKN2A would peak so early after injury and then drop off. Most studies in other systems show a gradual increase in CDKN2A levels with persistent stress as opposed to a rapid increase in response to acute stress. Could the drop-off be due to preferential death of these cells? The CSV % in 1m was taken from 7d after trauma (plus 7 days in monolayer it appears). Further discussion on the timing of traditional senescence markers as compared to the emergence of CSV would be useful.

      Author´s response:

      We would like to thank the reviewer for this comment. That CDKN1A was induced by mechanical trauma without significant decrease at the later time points was in line with the P53 expression, which we detected via immunohistochemistry (IHC; positive staining of chondrocyte nuclei in cartilage). P53 and P21 are regarded as interconnected senescence markers. Interestingly, P53 is not regulated on gene expression level upon cartilage trauma or Doxorubicine stimulation – but there is a significant increase in P53 nuclear translocation.

      Although such a discrepancy between gene expression and protein activity has not been reported in case of P16 or P21, we plan to investigate the dynamics of these cell cycle regulators and its connection to CSV after cartilage trauma in more detail in future studies.

      We included the following statement in the discussion part:

      “In the current study, we observed that CSV on chondrocytes was reduced by siRNA-mediated silencing of CDKN2A and increased after Doxo treatment or cartilage trauma. While we confirmed that mRNA levels of both CDKN1A and CDKN2A were significantly enhanced upon injury but exhibited different expression levels over time, we determined CSV-positive cells only at one time point after ex vivo cartilage trauma. Future studies might also consider earlier and later time points after cartilage injury to identify a potential time-dependent peak or decline in CSV-positive chondrocytes. In this way a potential association between CSV and the expression levels of CDKN1A and CDKN2A, which are thought to play differential roles in initiating and maintenance of senescence, respectively [50], might be clarified.”

      [50] Stein G, Drullinger L, Soulard A, and Dulić V. Differential Roles for Cyclin-Dependent Kinase Inhibitors p21 and p16 in the Mechanisms of Senescence and Differentiation in Human Fibroblasts. Mol Cell Biol. 1999;19(3): 2109–2117. https://doi.org/10.1128/mcb.19.3.2109.

      There is no CSV staining shown for figures 4 and 5. While the quantification of CSV was done by flow cytometry, it would nice confirmation to see the increase in CSV on the surface of cells with either siRNA for vimentin or the simvastatin.

      Author´s response:

      CSV-IF of simvastatin-treated chondrocytes is provided in Figure 5 (b). We did not perform exemplary staining of CSV after VIM-KD, because the quantification was performed via flow cytometry.

      Reviewer #2 (Significance):

      The strengths of the study include a rigorous design and the establishment of a potential new cell surface marker of chondrocyte senescence. The main limitation is that the conclusions are largely descriptive in nature.

      If CSV is confirmed as a robust marker of senescence, this would be of value to the field. While this marker has been explored previously in other systems, there is value in this manuscript given the wide range of contexts investigated for a cell type in which senescence likely has an important role.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This study presents a sound piece of science in the puzzle about extracellular vimentin in the differentiation/dedifferentiation of human chondrocytes and senescence and osteoarthritis. Eventhough, no mechanism is elucidated, the results clearly point towards a correlation of the amount of extra cellular vimentin and the level of chondrocyte senescence, and therefore signs of osteoarthritic changes in the cultivated chondrocytes. The methods applied are state-of-the art and provide the means to generate meaningful results in this experimental setting. The paper is concise and clearly written, there are only minor remarks.

      Author´s response:

      We thank reviewer #3 for their interest in our work and their overall positive report.

      Minor comments:

      1. The main clue of the paper is extra cellular vinemtin around chondrites in culture, please provide better pictures (1g) to support this. Why is the extra cellular staining seen so broad and not concentrated on the cells surface? The picture chosen imply a huge amount of vimentin to be externilized in disease states. It also indicates that in diseased chondrocytes no intact or semi-intact vimentin network is found intracellular. Please comment.

      Author´s response:

      In Figure 1g, CSV is located on the cell membrane. The pattern of the staining was surprising to us, as well. CSV was not equally distributed on the membrane, but rather represented an inconsistent pattern. Sometimes the staining was located at the filopodia of the cells, sometimes the whole cell was covered by spots. We also observed this on cancer cells, which was in line with other studies using this antibody. It remains unclear whether the distribution of the CSV has any effect. But we assume that the high abundance in filopodia might be connected with cell adhesion and mobility, which was positively associated with CSV.

      Yes, chondrocytes isolated from highly degenerated tissue exhibited higher CSV levels as compared to cells derived from macroscopically intact regions. Although we did not investigate the vimentin network of these cells, our observations in Doxo-treated cells imply, indeed, that intracellular vimentin might be altered in diseased chondrocytes. According to this, Blain et al (Ref. 13) reported that there is a disassembly of the intracellular vimentin network in OA chondrocytes, which can disturb the chondrocyte phenotype and contributes to the development of OA (see discussion).

      1. In the doxo experiment no extracellular vimentin is found? Please explain.

      Author´s response:

      Doxo-treated cells are highly positive for CSV (= extracellular vimentin on membrane). However, the intracellular vimentin is strongly decreased and some cells seem to be negative. We have not clarified the underlying mechanism by now, but it seems that senescence/ disease progression negatively affects the transcription of vimentin and, at the same time, promotes the externalization of the existing intracellular vimentin. Altogether, this might result in a decline in intracellular vimentin.

      1. The SEM picture is showing what. IGH? The red dots are colloidal gold particles? In any case the quantity of stain gathered EM level would not correlate to the huge amount seen in LM staining. Please comment.

      Author´s response:

      For the SEM analysis, a gold particle-coated secondary antibody was used. The positive signal usually appears in white and was subsequently colored via a software. In IF and ICC staining, we had a signal amplification due to the biotin-streptavidin system and the magnification makes, of course, a huge difference.

      1. Why the ICC in Fig. 3c? The siRNA is not detected in the KD? A reduction of Vimentin could be shown via WB.

      Author´s response:

      In Figure 3c, the KD of P16 was confirmed on protein level. In addition to the gene expression analysis, we chose the ICC (IF) to confirm that there is a decline in active (nuclear) CDKN2A. In case of P53, we made the experience that gene expression and the amount of cytoplasmic/ nuclear protein might not be consistent.

      In Figure 4, we confirmed the successful KD of vimentin on mRNA and protein level (flow cytometry plus IF). Of course, WB would also be possible, but we decided to use the methods in which the antibody was well established and we wanted to visualize the disturbance of the intracellular vimentin network upon KD.

      1. Fig. 4c, why are there no remnants of the vimentin networks seen in the chondrocytes? A Knock-down, not a KO is shown.

      Author´s response:

      In fact, most of the intracellular vimentin seems to be gone. However, there are some remnants (condensed fibers/ bundles) of the former vimentin network. We applied the VIM-KD over seven days. Usually, a KD experiment is only conducted for 2-3 days. But since we were not sure how stable the vimentin protein would be, we chose seven days. This long-lasting KD might have resulted in a strong decline of the protein. Moreover, the CSV levels on these cells were very high, indicating that existing vimentin was externalized and additionally decreased the amount of intracellular vimentin.

      1. Please comment of the concentration of simvastatin, why not nmolar?

      Author´s response:

      The concentration of Simvastatin was chosen in accordance with Trogden et al. (Ref. 26), who first described the effects of simvastatin on the vimentin network. A lower concentration might have had the advantage, that the effects were less severe, allowing a longer observation time than 24h. However, as a proof-of-principle model to demonstrate the connection between vimentin network collapse ant CSV expression, the concentration worked quite well.

      1. CSV+ is misleading in Fig. 6g, it's not an over expression.

      Author´s response:

      We would like to thank the reviewer for this comment and removed the “+” to make it less misleading.

      1. The concept of EMT is debatable, at least in kidney fibrosis, and chondrocytes are not epithelial cells. Please add a more critical discussion point.

      Author´s response: The authors agree with the reviewer’s argument that chondrocytes are no epithelial cells ant that the term EMT doesn’t seem to be appropriate. However, this is one leading hypothesis proposed by the working group of Prof. Mayán, who described CX43 and other EMT-markers on/ in senescent chondrocytes (see reference 31; more recently: Cell Death Dis. 2022;13(8):681. doi: 10.1038/s41419-022-05089-w).

      We added the following passage in the discussion part to indicate that this hypothesis is a controversial concept:

      “Nevertheless, the hypothesis that chondrocytes might undergo an EMT-like process remains controversially discussed, because chondrocytes are mesenchymal and not epithelial cells. In a recent review, Gems and Kern propose to consider senescent chondrocytes as activated and hyperfunctional remodeling cells occurring during OA progression [49]. Accordingly, chondrosenescence might represent an unsuccessful attempt of tissue repair. They further suppose that the senescent or activated chondrocytes are associated with a hypertrophic, bone-forming phenotype, following the process of bone development rather than hyaline cartilage formation. In line with this, we observed that CSV was associated with enhanced osteogenic capacities and a decline in chondrogenic properties.”

      [49] Gems and Kern, 2022): Geroscience. 2022;44(5):2461-2469. doi: 10.1007/s11357-022-00652-x.

      Reviewer #3 (Significance):

      The manuscript provides novel insight in the role of intermediary filaments, i.e. vimentin, on chondrocyte senescence and osteoarthritic changes in vitro. It's strength is a thorough elucidation of the connection with a wealth of experimental data, a weakness is the missing elucidation, or first experiments in the direction, of the cell biological mechanism.<br /> It is well suited for a broad audience, because it deals with fundamental cell biological phenomena, definitely it's important for the OA /chondrocyte biology community.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      We don't see the case for 1,5-IP8 as settled in plants, and none of the papers mentioned above draws this strong conclusion. This may be due to several limitations in the available data. The mentioned studies do not allow to differentiate the effects of 1-IP7 and 1,5-IP8 and, where binding or competition experiments have been performed, e.g. on the transcription factors, the differences in the Kd values for IP7 and IP8 were minor. Furthermore,1,5-IP8 levels and Pi starvation response do not always correlate. IPTK1 mutants, for example, show Pi overaccumulation, and low 5-IP7, but normal 1,5-IP8 (Riemer et al., 2021). Finally, plants are complex organisms with multiple tissue types that serve for accumulating, exporting, transporting or finally consuming Pi. Therefore, correlating inositol pyrophosphate levels from whole-plant extracts with a Pi starvation response is problematic, except if these data could both be obtained from the same cell types or at least tissues.

      The comment of the reviewer made us recognize that the complex situation in plants deserves a more detailed coverage and we have therefore adjusted the introduction accordingly.

      Results: "We determined the corresponding lysines in Pho81 (Fig. S3), created a point mutation in the genomic PHO81 locus that substitutes one of them, K154, by alanine, and investigated the impact on the PHO pathway."

      In my opinion, it would be important to test here in a quantitative in vitro binding assay if (i) the SPX domain of Pho81 can bind PP-InsPs including 1,5-InsP8, (ii) if the dissociation constant is in agreement with the cellular levels of 1,5-InsP8 in yeast (compare Fig. 2) and (iii) if the K154A mutation blocks or reduces the binding of 1,5-InsP8. Without such experimentation, I find the statement "this result underlines the efficiency of the K154A substitution in preventing PP-IP binding to the Pho81 SPX domain." to be overly speculative, as no binding experiment has been conducted.

      We agree with the comment of the reviewer concerning the overstatement in the phrase. It has been deleted.

      As mentioned already in our previous work (Wild et al., 2016), Pho81SPX counts among the SPX domains that we could not express recombinantly. Likewise, full-length Pho81, which would be the relevant object for correlating in vitro binding studies with the cellular concentrations, has not been accessible. Expression in yeast did not provide sufficient material for ITC or other quantitative techniques. Therefore, we refrained from pursuing binding studies. Nevertheless, given the high conservation of the positively charged patch on SPX domains and the fact that, in every case where it has been tested so far, SPX domains showed inositol polyphosphate binding activity, we find it a conservative assumption that the Pho81SPX binds them as well. This is supported by the effects of the binding site mutant, which mimics the effect of ablating IP8 synthesis.

      Results: "Inositol pyrophosphate binding to the SPX domain labilizes the Pho81-Pho80 interaction." Again, in the absence of any protein - protein interaction assay I find this statement not to be supported by the experiments outlined in the manuscript. The best way to address this point would be to perform either co-IP or in vitro pull-down experiments between Pho81-SPX and Pho81-85, in the pre- and absence of 1,5-InsP8 and/or using the Pho81 point-mutants described in the text.

      Since Pho81 could not be produced recombinantly, neither by us nor by others who worked on this protein previously, quantitative in vitro binding assays are not accessible for now. A simple IP suffers from the problem that Pho81 interacts with Pho85-Pho80 not only through the SPX domain but also through the minimum domain. The latter interaction may be constitutive. Since the main point of the manuscript is not to dissect the exact mechanisms of Pho85-Pho80 regulations, but only to address the point why the postulated inactivation of this kinase by an 1-IP7/minimum domain complex makes no sense, we prefer not to show a profound (and more complex) analysis of how the different Pho81 domains contribute to binding.

      To test the potential of the SPX domain for binding Pho85/Pho80 in vivo, we have created a GFP-fusion of the SPX domain of Pho81. This fusion protein localizes mainly to the cytosol when cells are on high-Pi. Upon Pi starvation, it concentrates in the nucleus. This concentration is not observed in pho80 mutant background (New Fig. S7).

      In line with this, I would suggest to move the molecular modelling/docking studies from the discussion into the results section and to use these models to design some interface mutations that could be tested in coIP and/or pull-down assays. Alternatively, the authors may choose to omit the discussion section starting with: "Even though the minimum domain is unlikely to function as a receptor for PP-IPs this does not ... and ending with . In sum, multiple lines of evidence support the view that the SPX domain exerts dominant, 1,5-IP8 mediated control over Pho81 activity in response to Pi availability."

      We have now moved the modelling data to the Results section. The structure prediction of the interface is experimentally validated. Data on the effect of interface substitutions are already published, although these substitutions had not been recognized as affecting a common interface at the time. Substituting the interface residues either on the side of Pho80 or of Pho81 constitutively activates Pho85-Pho80 kinase and destabilizes its interaction with Pho81. This was shown by Co-IP experiments from cell extracts by Huang et al. We mention the respective substitutions in the manuscript and cite the paper in which their effect on PHO pathway activation had been described.

      Reviewer #2 (Recommendations For The Authors):

      Some points need additional attention by the authors:

      • In general, it would be helpful to introduce abbreviations more thoroughly (certain enzyme names, PA, MD, ...)

      We paid more attention to this.

      • Also in general, the authors may want to think about the nomenclature of inositol pyrophosphates. Given the expansion of PP-IPs that are being detected in different organisms these days it may be a good time to convert to a more precise nomenclature, i.e. 5PP-IP5 instead of 5-IP7; and 1,5(PP)2-IP4, instead of 1,5-IP8. The latter could just be stated once, and then be abbreviated as IP8.

      To our understanding the field has not yet come up with a unified nomenclature. Therefore, we prefer to stick with the more practical nomenclature that we have chosen, which also corresponds to what is commonly used in presentations and discussions among colleagues. We have now introduced a sentence making the link to the nomenclature that the reviewer has proposed.

      • p. 1, Abstract: "negative bioenergetic impacts" - the phrasing seems really vague

      Agreed, but we find it difficult to be more explicit and precise in the abstract while remaining concise and not distracting from the main message. This aspect is better explained in the introduction.

      • p. 3, Significance statement: "... unified model across all eukaryotic kingdoms" While the intended meaning of this wording is better explained in the text later, the phrasing here suggests a more all-encompassing study at hand, instead of a conclusion that fits more closely with established reports from other organisms. Please rephrase.

      We have adapted the phrase to avoid this impression.

      • p. 4: "IPTKs" - are the ITPKs meant here?

      Yes, that was a typo.

      • p. 7, the introduction ends abruptly and could use a concluding sentence.

      Done

      • p.7, "enzymes diphosphorylation either the..."; I understand what the authors are trying to say with diphosphorylating, but the enzymes are phosphorylating a phosphorylated substrate.

      Yes. We changed the phrase to "....adding phosphate groups at the 1- or 5-positions....".

      • p. 7, subtitle "...concentrations and kinetics of..."; kinetics of what? Synthesis/turnover?

      We corrected this subtitle

      • p. 8, with regards to the recovery experiment: Was this recovery determined elsewhere (please cite)? Otherwise it would be beneficial to include an extra figure to illustrate these recoveries in the supplementary information. And do the authors suspect some hydrolysis of IP8 given the lower recovery?

      We have now added the experiment testing recovery of IPPs as the new Fig. S1.

      • p. 9: It is appreciated that the authors point out the concentration of IP6 in S. cerevisiae. I found that concentration rather low, and the authors could highlight this a bit more, given their ability to carry our absolute quantification.

      This was a leftover from a previous version of the paper. Since the paper does not treat IP6 or lower inositol polyphosphates, we have deleted this phrase.

      • p. 9, Fig 2: The exponential decay of 5-IP7 is very nicely shown in Figure 2c. But one of the most important discussion points is IP8 being the key controller of the PHO pathway - it would therefore be beneficial for the argument to also show the same kind of graph for IP8 and if possible, fit a function to the data points to better quantify and compare the decay processes (e.g. via "half-life time" of PP-IPs during starvation, in addition to the suggested "critical concentration" which was only discussed for 5-IP7 thus far).

      Kinetic resolution is an issue here. The approach shown in Figs. 2 and 5 is not apt to determine a critical concentration of IP8 because the decline upon transfer to starvation conditions is too fast and difficult to relate to the equally rapid induction of the PHO pathway. We shall address this point in a more appropriate setup in a future study.

      • p.9, Fig 2a: Where does the 5-IP7 come from in the kcs1Δ strain? In the text the authors state that 5-IP7 in kcs1Δ was not detected, but the figure suggests otherwise. Please explain.

      Currently, we do not know where these residual signals stem from. One possibility is that they represent other isomers that exist in minor concentrations and that are not resolved from 5-IP7 in CE. We added a sentence to the figure legend to indicate this.

      • p. 10: "IP8 was undetectable in kcs1Δ and decreased by 75% in vip1Δ. kcs1Δ mutants also showed a 2 to 3-fold decrease in 1-IP7, suggesting that the synthesisof 1-IP7 depends on 5-IP7. This might be explained by assuming that a significant source of 1-IP7 is synthesis of 1,5-IP8 through successive action of Kcs1 and Vip1, followed by dephosphorylation to 1-IP7." - Please specify this statement. Do the authors mean that 1,5-IP8 is only produced transiently below the detection capabilities of the method but that there still is a (reduced) flux from 5-IP7 to 1,5-IP8 to 1-IP7? Otherwise it would seem paradoxical to have a dependency on a non-existing metabolite in that cell line.

      This was not clearly expressed. The revised version now says: " ... a 2 to 3-fold decrease in 1-IP7, suggesting that the synthesis of 1-IP7 depends on 5-IP7. This might be explained by assuming that, in the wildtype, most 1-IP7 stems from the conversion of 5-IP7 to 1,5-IP8, followed by dephosphorylation of 1,5-IP8 to 1-IP7.". We hope that this clarifies the matter.

      • p. 10: "pulse-labeling approaches are not available for PP-IPs." While this statement is correct, a recent paper co-authored by Qui and Jessen showed nice pulse-labeling data for the lower Ips and could be cited here (PMID: 36589890)

      Yes, indeed, we should have been more precise here. What we wanted to express was that rapid pulse-labeling methods for following phosphate group turnover were lacking, with a temporal resolution of minutes rather than hours. Existing pulse labeling approaches, including the study mentioned by the reviewer, do not provide that. We have changed the phrase accordingly.

      • p. 10: continuation of caption of Fig 2: "were extracted [and] analyzed"

      Corrected. Thank you.

      • p. 12: How is 1-IP7 made in the vip1 kcs1 double mutant?

      As explained above, we suspect that these may be side products of IPMKs, which accumulate in the absence of vip1 phosphatase.

      • p. 13, caption to Figure 3: "XXX cells were analyzed" please replace the place holder XXX.

      Done. Thank you.

      • p. 13, Fig 3B, C, D and p. 50, Fig. S4: On screen the contrast between the different shades of grey of the bars are just visible enough, but not on paper, I suggest using a higher contrast/ different colouring scheme.

      We enhanced the contrast.

      • p. 24, 25, Fig 7.: I could not really appreciate the AlphaFold part, and found it unnecessary. No docking or molecular dynamics simulations were carried out here, and it was not clear to me what information should be gleaned from this part.

      Following this comment, we have modified the respective part of the text. This part refers to a publication from the O'Shea lab (Nat. Chem Biol. 4,25) proposing the model that 1-IP7 and the Pho81 minimum domain bind competitively to the active site of Pho85 to inhibit its kinase activity. Modeling of complexes between Pho81, Pho80 and Pho85, which we present in the manuscript, rather suggests binding of the minimum domain to a groove in Pho80. This is important because it provides a viable alternative model for the action of the minimum domain. It suggests the minimum domain as a constitutive linker that attaches Pho80 to Pho85. Importantly, this model accounts perfectly for the results of previous random mutagenesis studies on Pho80 and on the minimum domain, which had independently identified both the Pho80 groove and the minimum domain residues that bind it in the prediction as critical residues for inhibition of Pho85, and for integrity of the Pho85/Pho80/Pho81 complex. We find this alternative explanation for Pho85-Pho80 regulation by Pho81, which we can derive by combining the predictions with already published experimental data, an important element to re-evaluate the relevance of 1-IP7 in PHO pathway regulation and resolve one of the existing discrepancies.

      • p. 28: No experiments were carried out with plants or mammals. The relevance for plants or mammalian systems therefore seems to be overstated at this point in time.

      We are not quite sure how to interpret this remark. We do not claim that our data support a role for IP8 in mammals and plants. But we refer to and cite studies providing the strongest evidence in favor of it in these systems. The relevance of our current study relies in refuting seemingly strong evidence from yeast, which had been diametrically opposed to the data obtained in plants and mammals. The revision of the situation in yeast now paves the way to drawing a coherent concept for fungi, plants and mammals. We feel that this is important and should be underlined.

      • p. 31: "300 mL of 3% ammonium" - 300 µL?

      Yes. Thank you.

      • p. 45, CE-ESI-MS parameters: "1IP8"

      Corrected.

      • p. 47: Figure S1: Please include more experimental details in the caption and/or methods section. Was a similar analysis software used as e.g. Figure S2 (NIS Elements Software)? Please also include all the analysis software in the Methods section under "fluorescence microscopy". Unless these additional experimental details already clarify the following point: Can the authors briefly comment on why the morphological determination in S1 requires trypan blue staining while in later experiments the yeast cells are readily recognized by the software in "simple" brightfield images?

      Trypan blue staining is not strictly required for this. It is just a simple method to fluorescently stain the cell wall. There are many other ways of delineating the cells. It could also have been done in a brightfield image.

      We updated the figure legend to better describe how these measurements were done and deposited the script and training file on figshare.

      • p. 48: "can be downloaded from **" please insert the link once the script is available online.

      It has been deposited at Figshare under DOI 10.6084/m9.figshare.c.6700281

      Reviewer #3 (Recommendations For The Authors):

      1) Italicize the scientific names of the organisms; this was inconsistent throughout the manuscript. Also, gene names should be italicized; this was also inconsistent (e.g., p.12 "... did not induce the PHO84 and PHO5 [sic] promoters...).

      Done

      2) Summary of the Figure 2A data in the text (p.9) probably has swapped the determined concentrations for 1-IP7 and IP8 (0.3 µM or 0.5 µM) as compared with the data figure.

      Yes, indeed. We have corrected this.

      3) Figure 2A: which of the mutant PP-IP levels are significantly different from the WT control?

      We have now added asterisks to indicate the significance for every mutant.

      4) In the discussion on the data (Fig. 2A), I was tripped up by the verb tense in this phrase "5-IP7 has not been detected in the kcs1Δ mutant and 1-IP7 has been strongly reduced..."; I think you want to use the past tense "was" in both cases [as is used in the next sentence]. It made me wonder if there was a difference in the detection of 5-IP7 and IP8 in the kcs1Δ mutant, you could detect 5-IP7 but not IP8; if so, where did the 5-IP7 come from?

      We have corrected the tense. Thank you for highlighting this. For the residual inositol pyrophosphate signal in kcs1Δ. We do not know its origin. One possibility, which we now mention in the text, is that it stems from IPMK side activity. It should be underlined that all signals disappear upon PI starvation.

      Figure 2C, include the data points that the lines are built from (suggestion).

      We refrained from that for the line graphs. For reasons of consistency, we should do this for every line graph. If we did that, Fig. 4B would become quite hard to read.

      6) Figure 3B-D, please check that the stipples or hatches are in the figure - the printed copy lacked them although I could see them in the electronic version; this was also true for Figures 5 and 6 (I do not know if it is a printer issue, but other hatches were visible: e.g., not seen in S4 but seen in S5).

      They are visible in our copies, also after printing. They may have been lost during file conversion at the journal.

      7) The text description of the Pho4-yEGFP, Pho5-yEGFP and Pho84-yEGFP says that the kcs1Δ mutant "showed Pho4-yEGFP constitutively in the nucleus already ... and PHO5 and PHO84 were activated". However, the data is more complex than that: whereas the localization of Pho4-yEGFP is constitutively nuclear, there is a higher basal (repressed) expression of both Pho5 and Pho84 as well as increased expression of both proteins under -Pi conditions. What accounts for the increased expression when Pho4 is already nuclear? This is also seen in the vip1Δ kcs1Δ mutant.

      We agree with the reviewer, but we cannot explain this effect with certainty. One possibility could be a wider dysregulation of Pi metabolism in kcs1 mutants. To name a few possibilities: Wildtype cells have polyphosphate reserves that are gradually mobilized during the first hours of P-starvation. kcs1 mutants don't have those and might fall into a "deeper" state of starvation faster. It should be kept in mind that the starvation response is also regulated at the level of chromatin structure, and by antisense transcripts. The influence of kcs1 on these processes is unclear.

      8) Figure 9 legend: please add a definition of the MP region (in red) and include it more explicitly in the described model.

      We now mention the relevant region also in the legend and have labeled the relevant regions in the images (Huang et al., 2001).

      9) Figure S2 legend: information is missing (downloading link).

      It has been deposited at Figshare under DOI 10.6084/m9.figshare.c.6700281

      10) Figure S4 and S5, missing statistics.

      They have been added to the new Fig. S6, which interprets differences between strains and conditions. Fig. S4 (now S3) shows timecourses of IPPs down to zero. Adding statistics for all pairwise differences between the timepoints would be almost an overkill.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      It is very important to find practical and efficient means in order to increase agricultural productivity. Drawing on data from variable field environments, this study provides a useful theoretical framework to identify new factors that could increase agricultural production. There is solid evidence to support the authors' claims, though following the fate of candidate species after introduction into rice fields would have strengthened the study. Plant biologists and ecologists working in nature and fields will find the work interesting.

      Thank you so much for your careful evaluation of our manuscript. We are very pleased to hear that you found our framework useful. We have revised our manuscript according to the "Recommendations for the Authors" to improve our manuscript.

      Public Review

      Reviewer #1 (Public Review):

      This manuscript describes the identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      Thank you so much for evaluating our manuscript. We have carefully read and responded to your comments. We hope our responses resolve your concerns on our study.

      The strength of this manuscript is to attempt application of eDNA analysis-based plant growth differentiation. The weakness is too preliminary data and experimental set-up to make any conclusion. The trials of authors experiments are ideal. However, the process of data analysis did not meet certain levels. For example, eDNA analysis of different time points on rice growth stages resulted in two influential organisms for rice growth. Then they cultivate two species and applied rice seedlings. Without understanding of fitness and robustness, how we can know the effect of the two species on rice growth.

      We agree with your comments that we did not have the fitness data of the two species and/or rice seedlings. Thus, it is still difficult to obtain deep understanding of the mechanisms of our findings that the species introduced in the system would influence rice growth. Nonetheless, our study demonstrated the effectiveness of our research framework as we found evidence that the species that were discovered by the eDNA monitoring and time series analysis indeed cause changes in the system. We believe that the first step is to show that the framework is workable and that detailed understanding of the mechanisms or genetic pathway was not a focus of our study. To avoid misunderstanding, we have added several explanations regarding this point in L426–431 and L447. For example, in L426, we have added the following statement: "... the detailed dynamics of the two introduced species was unclear (i.e., the fate of the introduced species). This is particularly important for understanding how the introduced organisms affected rice performance...".

      The authors did not check the fate of two species after introducing into rice. If this is true, it is difficult to link between the rice gene expression after treatments and the effectiveness of two species. I think the validation experiment in 2019 needs to be re-conducted.

      We did not check the fate of the two species (except measuring the eDNA concentrations of the species), and it is true that we cannot show evidence of "how" these two species influence the rice gene expression. Understanding molecular mechanisms of the phenomenon that we found is important (especially from the viewpoint of molecular biology), but our primary objective was to demonstrate that our "eDNA x time series analysis" framework is feasible for detecting previously overlooked but influential organisms. To this end, we believe that we achieved our objective and repeating the validation experiment should be for a different purpose (i.e., for understanding molecular mechanisms). We have clarified these points in L426–431 and L447 as explained above.

      Reviewer #2 (Public Review):

      The manuscript "Detecting and validating influential organisms for rice growth: An ecological network approach" explores the influence of biotic and abiotic entities that are often neglected on rice growth. The study has a straightforward experimental design, and well thought hypothesis for explorations. Monitoring data is collected to infer relationships between species and the environment empirically. It is analyzed with an up-to-date statistical method. This allowed the manuscript to hypothesize and test the effects most influential entities in a controlled experiment.

      Thank you so much for your careful evaluations. We are pleased to see that you evaluated our manuscript positively. We have further revised our manuscript according to your comments and hope the revision has resolved your concerns.

      The manuscript is interesting and sets up a nice framework for future studies. In general, the manuscript can be improved significantly, when this workflow is smoothly connected and communicated how they follow each other more than the sequence and dates provided. It is valuable philosophical thinking, and the research community can benefit from this framework.

      Thank you for your suggestions. In order to improve the logic flow and readability of our manuscript, we have revised the descriptions of workflow and clarified how the experimental and statistical steps were connected to each other. To do so, we have added brief explanations about what/how we did at the first sentence of Results subsections (some of these explanations were only in Materials and Methods in the original manuscript). Also, we have moved all of the Supplementary Materials and Methods to the main text. We have thoroughly revised the manuscript, and we hope that all the parts of our manuscript have been connected more smoothly than in the original manuscript.

      I understand the length and format of the manuscript make it difficult to add more details, but I am sure it can refer to/clear some concepts/methods that might be new for the audience. How/why variables are selected as important parts of the system, a tiny bit of information about the nonlinear time series analysis in the early manuscript, and the biological reasoning behind these statistically driven decisions are some examples.

      We have explained how/why variables are selected (in L125), added more information about the nonlinear time series analysis (in L129 and L175) , and added the biological reasoning behind the statistical decisions (L195).

      Reviewer #3 (Public Review):

      Most farming is done by subtracting or adding what people want based in nature. However, in nature, crops interact with various objects, and mostly we are unaware of their effects. In order to increase agricultural productivity, finding useful objects is very important. However, in an uncontrolled environment, it coexists with so many biological objects that it is very inefficient to verify them all experimentally. It is therefore necessary to develop an effective screening method to identify external environmental factors that can increase crop productivity. This study identified factors presumed to be important to crop growth based on metabarcoding analysis, field sampling, and non-linear analysis/information theory, and conducted a mesocosm experiment to verify them experimentally. In conclusion, the object proposed by the author did not increase rice yield, but rather rice growth rate.

      Thank you so much for your evaluation of our manuscript. We have revised our manuscript based on your comments, and hope it has been improved compared with the original version.

      Strength

      In actual field data, since many variables are involved in a specific phenomenon, it is necessary to effectively eliminate false positives. Based on the metabarcoding technique, various variables that may affect rice growth were quantitatively measured, although not perfectly, and the causal relationship between these variables and rice growth was analyzed by using information transfer analysis. Using this method, two new players capable of manipulating rice growth were verified, despite their unknown functions until now. I found this process to be very logical, and I think it will be valuable in subsequent ecological studies.

      We are very pleased to see that you found our framework is very logical and potentially beneficial for future ecological studies.

      Weaknesses

      CK treatment's effectiveness remains questionable. Rice's growth was clearly altered by CK treatment. The validation of the CK treatment itself is not clear compared to the GN treatment, and the transcriptome data analysis results do not show that DEG is not present. The possibility of a side effect caused by a variable that the author cannot control remains a possibility in this case. Even though this part is mentioned in Discussion, it is necessary to discuss various possibilities in more detail.

      We agree that the effectiveness of the CK treatment was questionable. We have added some more discussion about this point in L376: "The unclear effects of the CK treatment relative to those of the GN treatment could be due to the relatively unstable removal method (i.e., C. kiiensis larvae were manually removed by a hand net) or incomplete removal of the larvae (some larvae might have remained after the removal treatment)."

      Reviewer #1 (Recommendations For The Authors):

      Comment #1-1 This manuscript describes identification of influential organisms on rice growth and an attempt of validation. The analysis of eDNA on rice pot and mimic field provides rice growth promoting organisms. This approach is novel for plant ecology field. However current results did not fully support whether eDNA analysis-based detection of influencing organism.

      Thank you for your careful evaluations of our manuscript. We are pleased to see you found that our approach is novel. We have revised our manuscript in accordance with your comments, and we hope that the revision and responses resolved your concerns.

      Comment #1-2 1. Experimental setting: Authors made up small scale pot system in 2017 and then expanded manipulative experiment. I do not understand how two influencing organism sequences were identified from the single treatment depending on different time points. How they can be convince the two organisms affect the rice growth rather than other biological and environmental factors.

      In 2017, we performed an intensive monitoring of the experimental rice plots and obtained large time series data (122-day consecutive monitoring x 5 plots = 610 data points). The time series data were analyzed using the information-theoretic causal analysis. The analysis is critically different from correlational analyses and designed to identify causal relationships among variables. Although we understand that field manipulation experiments are a common and straightforward approach to identify causal relationships among organisms, we chose the "fieldmonitoring + time-series-based causal analysis" approach. This is because, as explained in the main text, there are numerous factors that could influence rice performance, and it is practically impossible to perform manipulative experiments for all the potential factors that could influence rice growth. On the other hand, our "field-monitoring + timeseries-based causal analysis" approach has a potential to identify multiple factors under field conditions, even by the single experimental treatment.

      Nonetheless, we must admit that our time-series-based approach still has a chance to misidentify causal factors. Our framework relies on statistics, so the chance of false-positive detection of causality cannot be zero. This was exactly the reason why we performed the "validation" experiment in 2019. To complement the statistical results of the 2017 experiments, we performed another experiment in 2019.

      Comment #1-3 2. eDNA technology: The eDNA analysis based on four universal primers 16s rRNA, 18s rRNA, ITS, and COI regions must not be enough to identify specific species. The resolution of species classification may not meet to confirm exact species. Thus, the accuracy of two species that they selected for further experiment is difficult to be confirmed. Authors also referred to "putative Globisporangium".

      Your point is correct. The DNA barcoding regions we selected are short and it is often difficult to identify species. However, this limitation could not have been overcome even if we had chosen a different genetic marker. The long-read sequencing technology could partially solve the issue, but the number of sequence reads generated by the long-read technique is less than that by the short-read sequencing technology, and comprehensive detection of all species in an ecological community was still challenging. Our approach struck a balance among the identification resolution, comprehensiveness of the analysis, and sequencing costs. In addition, even though we could not identify most ASVs at the species level, some ASVs could be identified at the species level (52 ASVs among the 718 ASVs which had causal influences on rice growth), and we selected the two species (G. nunn and C. kiiensis) from the 52 species.

      Further, the taxa assign algorithm we used here (i.e., Claident; Tanabe & Toju 2012 PLoS ONE 10.1371/journal.pone.0076910) adopted conservative criteria for species identification and has a low falsepositive probability.

      More importantly, this is also the reason why we performed the "validation" experiment in 2019. The species identified in the 2017 experiment are still "potential" organisms that influence rice growth (i.e., the hypothesis-generating phase), and we tested the hypothesis in 2019.

      Nonetheless, we must admit that clear description of potential limitations is important. Thus, we have discussed this in L418: "As for the second issue, short-read sequencing has dominated current eDNA studies, but it is often not sufficient for lower-level taxonomic identification. Using long-read sequencing techniques (e.g., Oxford Nanopore MinION) for eDNA studies is a promising approach to overcome the second issue".

      Comment #1-4 3. Biological relevance 1: Authors identify two organisms as influencing organism for rice growth. As conducting the first experiment in 2017, the 2019 experiment was different from natural condition. The two experiments in 2017 and 2019 were conducted under different conditions. How do they compare the experiments? At least, the eDNA analyses in 2017 and 2019 should be very similar. I cannot find such data.

      The experimental conditions were different between 2017 and 2019 because they were conducted in different years. Theoretically, it is ideal if the experimental conditions in 2019 are covered by the range of experimental conditions in 2017 (e.g,. rice variety, air temperature, rainfall, and solar radiation). If this condition were satisfied, the attractor (i.e., rice growth trajectory delineated in the state space) in 2019 would be within that in 2017, and our model prediction in 2017 would be used to predict dynamics in 2019 accurately. To fulfill the conditions, we made as much effort as possible: we used the same rice variety and soils in 2019 as those used in 2017, and started our experiment at the same timing in 2019 as that in 2017.

      Although natural ecological dynamics cannot be precisely controlled, our monitoring revealed that the ecological dynamics in 2019 was qualitatively similar to that in 2017. To demonstrate that the experimental conditions and eDNA community data were similar between the two experiments, we have presented the climate and eDNA data in an inset figure in Figure 3a, Figure 1–figure supplement 2, Figure 3–figure supplement 2. We must admit that these dynamics are not identical, but we hope that this resolves your concern.

      Comment #1-5 4. Lack of detail description: In the Materials and Methods, there are many parts which lack on detail description. For instance, authors must described the two species cultivation, application concentrations, and application methods.

      We have moved Supplementary Materials and Methods to the main text and added more detailed descriptions in Materials and Methods. Also, to improve the logical flow and readability of our manuscript, we have added brief explanations about what/how we did at the first sentence of Results subsections (some of these explanations were only in Materials and Methods in the original manuscript). We have added the reference for how to cultivate G. nunn in L608 (Kobayashi et al., 2010; Tojo et al., 1993) (C. kiiensis was not cultivated but removed from the system as in Materials and Methods), and application concentrations. Application methods were described in Materials and Methods, the section Field manipulation experiments in 2019 in L596.

      Comment #1-6 5. Validation: Application of one species clearly resulted to promote rice growth. They must include appropriate control treatment. If they pick same genus but different species that identified no specific effect on rice growth through eDNA analysis, no effect on growth can be provided. Generally application of large population of certain non-harmful organism confer plant growth promotion. It is not surprising result. Authors need to prove effectiveness of eDNA analysis. In addition, the field experiments required at least two years of consistent data for publication because environmental factors are so dynamic.

      Thank you for pointing this out. We agree with your comment that species that were predicted to have no effect should not promote rice growth in a validation experiment. It was also one of our inititial experimental plans to include such species in our manipulation experiment in 2019, but we could not include them because of the limitation of time, labor, and money. More extensive validation of the statistical results of the 2017 data, including multi-year experiments, would further validate the effectiveness of our approach, which should be done as future studies. To clarify this point, we have added statements in the paragraph starting at L396.

      Comment #1-7 In conclusion, I suggest that authors need more large data analysis and validate with more accurate and meaningful protocol.

      As we explained in the revised manuscript and the Response to Comments #1-2 to #1-7, our study demonstrated a novel research framework to detect previously overlooked influential organisms under field conditions. We agree that larger data analysis would be ideal to further validate our approach, but whether and how to collect larger data is constrained by time, money, and labor. We believe that our study was designed carefully and could provide meaningful avenues for developing an ecological-network based, novel, and environment-friendly agriculture solutions.

      Reviewer #2 (Recommendations For The Authors):

      Comment #2-1 Lines 97-110: This is so cool. Modeling with empirical data is very powerful. But a rice field is an open system consisting of metacommunity dynamics. Maybe a tiny bit of biological and biogeochemical background here would be good.

      Thank you for your comments. We have added a few examples of how and in which systems these methods were used to evaluate community dynamics and detect biological interactions in L109-L118.

      Comment #2-2 Lines 111-126: I like the summary of the study here. I think the influential species concept can be a little more elevated. Paine's famous keystone species work has been cited but a couple more pieces of literature can help to enhance the ecological importance of this work.

      We have explained the work by Paine (1966) a bit more and added one more paper that showed the effect of multiple predator species on the system dynamics at L88. We have also added a relevant sentence at L137 to emphasize the ecological/agricultural significance of our work.

      Comment #2-3 Experimental design/Figure 1:

      Is there any rationale behind choosing red individuals to measure the growth?

      Is there any competition between the individuals in the pots?

      Figure 1e: It is nice to show the ASVs in time. I wonder how the plot would look like when normalized by biomass/DNA content/coverage/rarefaction because of the seasonality.

      As for the first question, we chose the four individuals to minimize the edge effects (i.e., effects of microclimates and neighboring rice would be different between the four rice individuals and those planted in the edge regions). We have mentioned this in the legend of Figure 1.

      As for the second question, there might be competition among the individuals in the pot. However, we did not measure the effect of competition (e.g., by comparing the growth with/without other rice individuals).

      As for the third question, we published detailed dynamics of ecological community in the Supplementary Figures in Ushio (2022) Proceedings B https://doi.org/10.6084/m9.figshare.c.5842766.v1. In addition, we have uploaded a video showing the temporal dynamics of some top (= most abundant) ASVs in https://doi.org/10.6084/m9.figshare.23514150.v2.

      We have mentioned the supporting information in L153.

      Comment #2-4 Line 146-147: Is this damage influence the inferences? Maybe it is better to justify.

      While we occasionally observed physical damages, it is unlikely that they affected our causal inference because the changes in the rice heights due to the damages were smaller and less frequent than those due to growth. We have noted this at L151.

      Comment #2-5 Line 161-162: Maybe refer readers to the methods section where you explain UIC analysis. It'd be easier to interpret the figures.

      Mentioned.

      Comment #2-6 Line 175-176: I believe very brief information in the intro about the organisms might help explain the hypothesis and interpret the results better.

      We have included brief information of the two species at L197.

      Comment #2-7 Figure 2: Species interaction strength: Are these proxies to the Jacobians? Is there a threshold for the influence we can consider strong/weak? For example, influential species compared to diagonal elements of the Jacobians (intraspecies interactions) could be shown as a mean vertical line in Figure 2b.

      "Influences to rice growth" in Figure 2b is transfer entropy (TE) from a target ASV to rice growth. They are not proxies of the Jacobians, but they might positively correlate with the absolute value of the Jacobians. We have clarified this point in the legend (L953). More direct estimations of the Jacobian can be done using the MDR S-map method (Chang et al. 2021 DOI:10.1111/ele.13897), but we did not perform the MDR S-map in the present manuscript (see Ushio et al. 2023 https://doi.org/10.7554/eLife.85795 for the application of the MDR S-map). As for TE, there is no clear threshold to distinguish strong/weak interactions.

      Comment #2-8 Figure 2: Looking at panels c and d, it looks like there is a negative frequency selection between two influential species. Is it a reasonable observation?

      This is an interesting point. In this manuscript, we have not carefully examined the interspecific relationship between these two particular species. However, the interspecific interactions were examined in detail and reported in Ushio (2022) Proceedings of the Royal Society B DOI:10.1098/rspb.2021.2690). We re-checked the result in Ushio (2022); although there is a negative correlation between them, we did not find any (statistical) causal relationship between them.

      Comment #2-9 Line 209: What is t-SNE analysis? Because of the manuscript's format, maybe methods should be shortly referred to in the relevant section or explained in brackets.

      We have spelled out t-SNE.

      Comment #2-10 Line 212-214: Maybe briefly explain what the hypotheses are for the alternative analysis, and what is the contribution of the results to the study.

      We have added a brief explanation at L241: "Alternative statistical modeling that included the treatments (the control versus GN or CK treatments) and manipulation timing (i.e., before or after the manipulation), which simultaneously took the temporal changes of all the treatments into account, also showed qualitatively similar results (Supplementary file 4), further supporting the results."

      Comment #2-11 Figure 3b/c: Maybe species names as panel titles could be helpful. d: Treatment names with initials in the legend could be also helpful to read the plots.

      We have added species name as panel titles of Figure 3b,c. Treatment names were included in the legend of Figure 3.

      Comment #2-12 Line 233: Maybe mention why the manuscript uses the word "clear".

      We have mentioned this in L185.

      Comment #2-13 Line 234-236: I think that these alternative tests should be explained somewhere.

      We have revised the sentence so that it includes some explanations (L241). Also, we have referred to Materials and Methods.

      Comment #2-14 Figure 4: The title says ecological community compositions, and panels show the growth rates and cumulative growth.

      Thank you for pointing this out. This was a typo and we have corrected it.

      Comment #2-15 Lines 246-269: Can these expression patterns be transient and relevant to the time point that the sample is taken?

      Yes, these expression patterns were transient. We collected rice leaf samples for RNA-seq 1 day before the first manipulation and 1, 14, and 38 days after the third manipulation (see Supplementary file 3 for the sampling design). When we merged the pot locations, we observed no difference in the gene expression for samples 1 day before the first manipulation and 14 and 38 days after the third manipulation (except for two genes in samples 38 days after the manipulation), and thus, we consider the DEGs that appeared only in the short period after the manipulation. We have mentioned this in L278 and L383: "We found almost no DEGs for leaf samples taken one day before and 14 and 38 days after the third manipulation (the leaf sampling event 1, 3, and 4), suggesting that the influences of the treatments on the gene expression patterns were transient." (L278) and "These changes were observed relatively quickly and transient." (L383)

      Comment #2-16 I wonder if a conceptual framework figure would help to generalize the workflow that can be used for other studies.

      Thank you for your suggestion. Although we agree with your comment that such a figure would be helpful to generalize the workflow, we believe that our framework is clear and decided not to include it in the present manuscript. We might consider including such a figure (like Figure 1a in Ushio 2022) if we have an opportunity to write a review paper regarding this topic.

      Comment #2-17 Lines 329-335: I feel this information is unclear in the early manuscript. Maybe it's necessary to clearly communicate in the beginning.

      We have explained that we could not find any relevant information at least at the time we detected the ASVs in L189.

      Comment #2-18 Lines 336-337: Can these species be identified in the previous data set from the ASV sequences?

      Yes, these species were identified in the DNA data set obtained in 2017.

      Comment #2-19 Lines 387-397: Are there any measurements such as total biomass, and statistical methods to help with the eDNA bias and data compositionality?

      We have confirmed that our quantitative eDNA metabarcoding generates comparable results with the fluorescence-based method and quantitative PCR (e.g., see Supplementary Figures in Ushio 2022) (mentioned in L310 in the revised manuscript). However, at least in this study, we could not perform a direct comparison of the eDNA data with species abundance and/or biomass. This is partly because the number of our target species was too large (> 1,000 species). The accurate estimation of species abundance and/or biomass is one of our next goals.

      Comment #2-20 Line 472: Maybe mention transfer entropy somewhere in the early manuscript.

      We have mentioned this in L175.

      Comment #2-21 Lines 494-503: Maybe a summary of this reasoning should be mentioned somewhere in the early manuscript too.

      We have described a brief summary of the reasoning in L195.

      Comment #2-22 Lines 29-33 If this sentence is simplified it might be easier to follow.

      The sentence has been divided into two sentences in L28. Also, each sentence has been simplified.

      Comment #2-23 Line 38 Maybe "macrobes" can be explicitly mentioned. Fungi, protozoa, etc.

      Mentioned.

      Comment #2-24 Line 139: I am not sure if the date should be in the title.

      Similar monitoring was done in 2017 and 2019. Thus, we think the date is necessary in the section title.

      Comment #2-25 Figure 1: There are 4 red individuals in the design but 5 measurements in the plots.

      Heights and SPAD of the four individuals were measured for each plot and the averaged values were used as representative values for each plot. Therefore, 20 measurements (= 4 rice individuals 5 plots) were done every day, but each plot has one rice height for each day. We have clarified this in the legend of Figure 1: "the average values of the four individuals were regarded as representative values for each plot."

      Comment #2-26 Figure 1b: Maybe use the same axis length for the temperature as the other plots?

      Corrected.

      Comment #2-27 Lines 259-261: Are there the names of the genes in databases?

      Yes, these are gene names used in the rice databases (e.g., The Rice Annotation Project Database; https://rapdb.dna.affrc.go.jp/inde x.html).

      Reviewer #3 (Recommendations For The Authors):

      Comment #3-1 Additionally, RGR is not statistically significant, but statistical significance is observed only in cumulative growth because data presentation does not reflect plant characteristics. RGR changes according to the developmental stage of the plant. Therefore, if RGR data are shown separately according to the rice growing season, the cumulative growth pattern and the pattern will appear similar.

      RGRs were calculated daily (i.e., cm/day) and they changed depending on the developmental stage of the rice (Figure 1 and Figure 4–figure supplement 1). Therefore, we might find similar RGR patterns if we focus on a specific period of the growing season. However, unfortunately, we performed the intensive (i.e., daily) monitoring in 2019 only during the field manipulation period (middle June to middle July 2019), and we cannot investigate the changes in cumulative growth throughout the growing season (this depends on how many days we add up RGR to calculate the cumulative growth, though). We agree that, if we had investigated the detailed pattern of RGR throughout the growing season in 2019, we could have found similar pattens between RGR and cumulative growth rate at a certain period in the growing season. In Figure 4, the cumulative growths were calculated based on the RGRs before the third manipulation or during 10 days after the third manipulation. We clarified this in the legend of Figure 4.

    1. 12:3 Those who are wi se[a] will shine like the brightness of the heavens, and those who lead many to righteousness, like the stars for ever and ever. https://www.americamagazine.org/politics-society/2020/05/08/its-time-rethink-electoral-college https://www.npr.org/sections/itsallpolitics/2011/12/20/144016912/we-the-people-npr-readers-would-ratify-four-new-amendments https://www.americamagazine.org/politics-society/2020/05/08/its-time-rethink-electoral-college https://www.npr.org/sections/itsallpolitics/2011/12/20/144016912/we-the-people-npr-readers-would-ratify-four-new-amendments https://constitutioncenter.org/blog/vote-now-an-amendment-to-end-the-electoral-college https://www.nytimes.com/2020/02/09/opinion/letters/electoral-college.html https://www.latimes.com/opinion/readersreact/la-ol-le-electoral-college-20180904-story.html you are offline https://slate.com/news-and-politics/2014/05/amending-the-constitution-is-much-too-hard-blame-the-founders.html we the people rise again https://slate.com/news-and-politics/2012/06/fix-the-constitution-amending-by-national-referendum.html safe souls, safe fu https://slate.com/news-and-politics/2012/06/fixing-the-constitution-protecting-informational-privacy.html https://slate.com/news-and-politics/2020/05/new-reconstruction-constitution-democracy.html We the People of Slate … The U.S. Constitution, as you mighta been, shoulda [“come” on … its someday] rewrϕte it. "Politicians talk about the Constitution as if it were as sacrosanct as the Ten Commandments [interjection: spec. it is actually almost exactly related!]. But the document itself invites change and revision. What if the president served only one six-year term instead two four-year terms? What if your state’s population determined how many senators represent it? What if the Constitution included a right to health care? We asked legal scholars and Slate readers to cross out what they didn’t like in the Constitution and pencil in their hearts’ desires. Here’s what the document would look like with their best ideas." Slate: u_s_constitution as_rewritten by_slate_legal_experts_and_readers 多也了了夕 "with a wand of scheffilara, 并#亦太 he begins … "I am now on the Staff of Menelaus, the Spears of Longinus and Lancelot; and the name "Mosche ex Nashon." Logically the recent mentions of Gilgamesh and the simultaneous 同時 overlaping 場道 of the eventual link between the famous ruling of Solomon on the separation of babies and mothers and waters and land … to a story of many “two cities” that culminates in a cultural or societal or “evolutionary” link to Sodom and Gomorrah and the city-state of Babylon (and it’s Hanging Gardens) and also of course to Paris and Troy and “Masstodon” and city-states [ciudadestado] and perhaps planet-cities; from Cambridge to Cambridge across the “Cable” to see state to “London” … recently I called it “the city of realms” … I started out logically intending to link “game theory” and John Nash to the mathematical story of Sputnik and a revival of American physics; but in my usual way of rambling into the woods [I mean neighborhood] of stream of consciousness … turned into a premonitory discourse of “two cities” and how sometimes even things as obvious as the number of letters in the word “two” don’t do a good enough job of conveying … how and/or why one is simply never enough, and two isn’t much better–but in the end a circle … is drawn; the perfect circle in our imaginary mathematical perfection … I see a parted “line” in the letter pronounced “tea” (and beginning that word); and two “vee” (pron. of “v”) symbols joined together in a word we pronounce as “double-you” … and symbolically because I know “V” is the Roman Numeral for 5 (five) and I know not how to multiply in Roman numerals– It’s important to pause; here. I am going to write a more detailed piece on “the two cities” as I work through this maze like crossroads between “them” and “demo…” … here demorigstrably I am trying to fuse together an evolutionary change in … lit. biological evolution as well as an echelon leap forward in "self-government" … in a place where these two things are unfathomable and unspokenly* connected. https://www.google.com/search?q=prometheuslocke+%2Bsite%3Agodlikeproductions.com “Silence is betrayal” -MLK To a question on the idiom; is Bablyon about “the law” or “of the land of Nod?” “What is democracy” … the song, Metallica’s “ONE” echoes and repeats; as we apparently scrive together the word “THEM” … I question myself … if Babylon were the capital city of some mythical Nation of Time … if it were the central “turning point” of Sheol; ... >|< Can you not see that in this place; in a world that should see and does there is a gigantic message proving that we are not in reality and trying to show us how and why that's the best news since ... ever---that it's as simple as conjoining "the law of the land" with a basic set of rules that automatically turn Hell into something so much closer to Heaven I just do not understand---why we cant stand up together and say "bullets will not kill innocent children" and "snowflakes will not start avalanches ...." that cover or bury or hide the road from Earth to Verital)e .... or from the mythical Valis to Tanis---or from Rigel to Beth-El ... "guess?" ## as "an easy" answer; I'm looking for a fusion of "law and land" that somehow remembers a "jok'er a scene" about "lawn" seats; and "where the girls are green;" It's as simple as night and day; Heaven and Hell ... the difference between survival and--what we are presented with here; it's "doing this right"--that ends the Hell of representative democracy and electoral college--the blindness and darkness of not seeing "EXTINCTION LEVEL EVENT" encoded in these words and in our governments foundation ... by the framers [not just of the USA; but English .. and every language]  ... is literally just as simple as "not caring" or thinking we are at the beginning of some long process--or thinking it will never be done--that special "IT" that's the emancipation of you and I. Here words like "gnosis" and "gaudeamus" pair with my/ur "new ntersanding*" of the difference between Asgard and Medgard and really understanding our purpose here is to end "evil" ... things like "simulating disease and pain" (here, simulating meaning ... intentionally causing, rather than "gamifying away") and successfully linking the "Pillars of Hercules" to Plato's vision of Atlantis and the letter sequences "an" and "as" ... unlock a fusion of religion and mythology and "cryptographic truth" that connects "messianic" and "Christian" to "Roman" ... "Chinese" and "American" ... literally the key to the difference between the phrases "we are" and "we were" .... in "sight" of "silicon" in simulation and Israel, Genesis, and "silence" ... trying to the raising of Asgardian enlightenment ... and seeing "simple cypher" connecting to "Norse" ... and the "I AM THAT" surer than shit ... the intention and design of all religion and creation is to end "simulated reality" and also not seeing "SR" ... in Israel and Norse ... "for instance." https://www.google.com/search?q=%22I+AM%22+%22WE+ARE%22+%2Bsite%3Afromtaws "SOIS" a key--in two languages conjugated literally as both "I AM" and "WE ARE" simultaneously; Search: I know that if I am than so are you ... and it is because we have overcome .... something I truly cannot figure out, fathom, or believe ... was truly here before us--a spiralling series of failures ... speaking: to the heavens; but in secret and in action; "doing everything possible to succeed." It's a simple linguistic concept; the "singularity" and the "plurality" of a simple word--"to be"--but it goes to the heart of everything that we are and everything that is around us. This is a message about understanding and preserving individuality as well as liberty; and literally seeing "ARXIV" and understanding "often" and failing to connect God and prescience to "IV" and the Fourth Amendment ... it's about blindness and ... "curing the blind instantly" ... and fathoming how and why this message has been etched into our entire history and and all religions and myths and music--to help us "to be THAT we" that actually "are responsible" for the end of Hell. I neglected to mention "Har-Wer" and "Tower of Babel" which are both related lingusitically, religiously and topically: "to who ..." and while we're on "four score and [seven years from now]" seeing the fourth "living thing" in Eden and it's (the name, Abel) connection to Babel and Abraham Lincoln; slavery and ... understanding we live in a place where the history of the United States also, like Monoceros and "Neil Armstrong's first step" are a time shifted ... overlayed map to achieving freedom ... it's about becoming a father-race ... and actually "doing" the technological steps required to "emancipate the e's of 'me&e'" and survive in exo-planetary space--- it might be as simple as adding "because we did this" here and now; and having it be something we are truly proud of .... forevermore™ ... for certain in the heart of this story about cyclicality and repetition of error--its not because we did "this" or something over and over again; it's about changing "the problem" and then helping others to also overcome ... "things like time travel ... erasing speech" --- however that happenecl. I also failed to mention that "I am in Hell" ... as in this world is hellacious to me; in an overlay with the Hellenic period and this message that we are in the Trojan Horse ... a small gem .... "planet" truly is the Ark of the Covenant---and it's the simple understanding that "reality is hell" is to "living without air conditioning and plumbing is hell" just as soon as you achieve ... "rediscovering" those things--- I can't figure out why I am the only person screaming "this is Hell." That's also, Hell. ... but recently suggested an old joke about "there being 10 kinds of people in the world (obv an anti-tautology and a tautology simultaneously)" only after that brief bit of singularity and duality mentioning the rest of the joke: "those that understand binary and those that don't know how to base convert between counting with two hands and counting with only an 'on and off.'" It's not obvious if you aren't trying to figure it out, I suppose; but 10 is decimal notation for "kiss" and the "often" without "of" ... and binary notation for the decimal equivalent of "2." A long long time ago in a state that simply non-randomly ties to the heart of the name of our galaxy ... I was again thinking of the "perfect imperfections" of things like saying "three equals one equals one" (which, of course was related to the Holy Trinity and it's "prescient/anachronistic Adamic presence encoded in the name Ab|ra|ha|m" which means "father of a great multitude") ... I brought that one back in the last few months; connecting the letter K and in this "logos-rythmic" tie to the "base of a number system" embellish the truth just a bit and suggest a more accurate rendition of the original [there is no such thing as equality, "is" of separate objects--as in no two snowflakes are the same unless they are literally the same one; true of ancient weights and with the advent of (thinking about) time no two "planets" are the same even if they're the exact same one--unless it's at a fixed moment in time. This name may be viewed either as meaning "father of many" in Hebrew or else as a contraction of ABRAM (1) and הָמוֹן (hamon) meaning "many, multitude". The biblical patriarch Abraham was originally named Abram but God changed his name (see Genesis 17:5). https://en.wikipedia.org/wiki/Yeshua#Yeshua,_Yehoshua,_and_Yeshu_in_the_Talmud K=3:11 ... to a handle on the music, the DHD of the gate and the *ring of David's "sling" ... ---and that's a relationship of "3 is to 11" as [the SAT style "analog]y" as a series of alpha, two mathematic, and two numeric symbols ... may only tie in my mind alone to the books of Genesis and Matthew and the phrase "chapter and verse" and to the stories of Lot and Job ... again in Genesis and the eponymous "Book of Job." So ... "tying up loose ends one 10b [III] iv. " as it appears I've taken it upon myself to call a Job and suggest is my "Lot in life [x]i* [3]" I worry sometimes that important things are missing, or will disappear---for instance Mirriam Webster, which is a "canonical/standard dictionary) should probably have an entry for "lot in life" non-idiomatically as "granny apples to sour apples" as 2 MANY ALSO ICI; 1twoⅱ ... following in Mitnick's bold introductory word steps; the curve and the complement ... the missiles and the canoes; the line and the blank space ... "supposedly two examples of two kinds, which could be three not nothings ... Today I write about something monumental; as if as important as the singularity depicted in Arthur C. Clarke's 2001 "A Space Odyssey" ... and remember a day when I thought it very novel and interesting to see the words "stillborn and yet still born" connected in a single piece of writing to "Stillwater and yet still water" ... today adding in another phrase noting the change wrought only by one magical single "space" (also a single capital letter; and a third phrase): "block chains with a great blockchain." http://www.goodmath.org/blog/2015/07/21/arabic-numerals-have-nothing-to-do-with-angle-counting/ https://gizmodo.com/no-this-viral-image-does-not-explain-the-history-of-ar-1719306568 https://en.wikipedia.org/wiki/Chinese_word_for_%22crisis%22 https://dictionary.hantrainerpro.com/chinese-english/translation-ji_howmany.htm https://dictionary.hantrainerpro.com/chinese-english/translation-duo_many.htm https://en.wikipedia.org/wiki/Euripides, Iphigenia in Aulis or Iphigenia at Aulis[1] (Ancient Greek: Ἰφιγένεια ἐν Αὐλίδι, Iphigeneia en Aulidi; variously translated, including the Latin Iphigenia in Aulide) is the last of the extant works by the playwright Euripides. Written between 408, after Orestes, and 406 BC, the year of Euripides' death, the play was first produced the following year[2] in a trilogy with The Bacchae and Alcmaeon in Corinth by his son or nephew, Euripides the Younger,[3] and won first place at the City Dionysia in Athens. The play revolves around Agamemnon, the leader of the Greek coalition before and during the Trojan War, and his decision to sacrifice his daughter, Iphigenia, to appease the goddess Artemis and allow his troops to set sail to preserve their honour in battle against Troy. The conflict between Agamemnon and Achilles over the fate of the young woman presages a similar conflict between the two at the beginning of the Iliad. In his depiction of the experiences of the main characters, Euripides frequently uses tragic irony for dramatic effect. J.K. Rowling spurred just this past week a series of explanations about just exactly what is a blockchain coin worth ... and why is it so; her final words on the subject (artistic liberty taken, obviously not the last she'll say of this magic moment) "I don't think I trust this." Taken directly from an off the cuff email to ARXM titled: "Slow the S is ... our Hypothes.is" I imagine I'll be adding some wiki/ipfs stuff to it--and try to keep it compatible; the design and layout is almost exactly what I was dreaming about seeing--as a "first rough draft product." Lo, and behold. It's been added to the many places I host my tome; the small compilation of nearly every important email that has gone out ... all the way back to the days of the strange looking Margarita glass ... that now very much resembles the "Cantonese character 'le'" which I've come to associate with a "handle" on multiple corners of a room--something like an automatic coat rack conveyor belt connecting different versions of "what's in the box." I'm planning on using that symbol 了 to denote something like multiple forks of the same page. Obviously I'm thinking forward to things like "the Transhumaist Chain Party" (BDSM, right?)'s version of some particular piece of legislation, let's say everything starts with the sprawling "bulbing" of "Amendment M" ideas and specific verbiage ... and then we'll of course need some kind of new git/subversion/cvs style version control mechanism to merge intelligently into something that might actually .... really should ... make it into that place in history--the first constitutional amendment ratified by a "Continental Congress of All People" ... but you could also see it as an ongoing sort of forking of something like the "wikipedia page" on what some specific term, say "technocracy" means, and how two parties might propagandize and change the meaning of such thing; to suit the more intelligent and wise times we now live in. For instance, we might once have had a "democracy" and a "democractic" party that had some Anarchist Cook Book version of the history of it ending in something like Snipes and Stallone's "DEMOLITION MAN." Just kidding, we all know "democracy" has everything to do with "d is cl ... and not th" ... to be the them that is the heart of the start of the first true democracy. At least the first one I've ever seen, in my old "to a republic" ... style. As it is you can play around with commenting and highlighting and annotating all the stuff I've written and begged and begged for comments on--while I work on layering the backend to to perma-store our ideas and comments on both a blockchain (probably a new one; now that i've worked a little with ethereum) with maybe some key-merkle-tree-walk-search stuff etched into the original Rinkeby ... and then of course distributed data in the "public owned and operated" IPFS. To be clear, I plan on rewriting the backend storage so that we will have a permanent record of all comments; all versions of whatever is being commented on; and changes/revisions to those documents--sort of turning the web into a massive instant "place of collaboration, discussion, and co-authoring" ... if you use the wonderful LEGO pieces that have been handed to us in ideas from places like me, lemma--dissenter, and of course hypothes.is who has brought you and i such a polished and nice to look at "first draft" of something like the living Constitution come repository of all human knowledge. I do sort of secretly wich they would have called this project something like "annotating and reflecting (or real or ...) knowledge" just so the movement could have been called ARK. ... or something .... but whatever join the "calling you a reporter" group or ... "supposedly a scientist?" NOIR INgR .. I CITE SITE OF ENUDRICAM; a rekindling of the dream of a city appearing high above in the sky, now with a boldly emblazened smiling rainbow and upsidown river ... specifically the antithesis of "angel falls," there's a lagoon too--actually a chain of several ponds underneith the floating rock ... and in some versions of this waking dream there are rings around the thing; you might imagine an artificial set of centripetal orbitals something like a fusion of the ring Eslyeum and the "Six-Axis ride" of the JKF Center's "Spacecamp." I write as I dream, and though I cannot for certain explain exactly how; it's become a strong part of my mythology that this spectacular rendition of "what ends the silence" has something to do with the magical delivery of "a book" ... something not of this Earth but an unnatural thing; one I've dreamt of creating many times. This book is something like the DSM-IV and something like a Merck diagnostic manual; but rather than the old antiquated cures of "the Norse Medgard" this spectacle nearly "itsimportant" autoprints itself and lands on something like every doorpost; what it is is a list of reasons why "simply curing all disease" with no explanation and no conversation would be a travesty of morality--how it would render us half-blind to the myriad of new solutions that can come from truly understanding why "ITIS" to me has become a kind of magical marker: an "it is special" as in, it's cure could possibly solve a number of other problems. Through that missing "o," English on the ball, we see a connection between a number of words that shine bright light including Exodus itself which means "let there be light," the word for Holy Fire and the Burning Bush.. .reversed to hSE'Ah, and a story about the Second Coming parting our holy waters. This answer connects the magical Rod's of Aaron in Exodus and the Iron Rod of Jesus Christ to the Sang Rael itself... in a fusion that explains how the Periodic Table element for Iron links not just to Total Recall and Mars, but also to this key my dream of what the first day of the Second Coming might be like; were the Rod of Christ... in the right hands. In a story that also spans the Bible, you might understand better how stone to bread and your input make all the difference in the world between Heaven and Adam's Hand. Once more, what do you think He ....   Since the very earliest days of this story, I have asked for better for you, even than see Nearly all of the original parts of the original "post-origination dream" remain intact; there's a walkway that magically creates new paths and "attractions" based on where you walk, something like an inversion of the artificial intelligence term "a random walk down a binary tree" ... for instance going left might bring you to the Internet Cafetornaseum of the Earl of Sandwich; and going to the right might bring you to the ICIMAX/Auditorium of Science and Discovery--there's a walkway to "Magical GLAS D'elevators" that open a special "instantiation" of the Japan Room of the Potter and the Toolmaker ... complete with a special [second level and hidden staircase] Pool of Bethesdaibo verily delivering something like youth of mind and body ... or at least as close to such a thing as a sip of Holy Water or Ambrosia or a dip in the pool of Coccoon and Ponce De'Leon could instantly bring ... to those that have seen Jupiter Ascending ... the questions of "nature versus nurture" and what it means to be "old and wise" and "young at heart" truly mean--- https://www.youtube.com/watch?v=M8CyN1awWls https://link.springer.com/chapter/10.1057/9780230366688_16 https://www.youtube.com/watch?v=YDo5zvYNn3A Somewhere between the outdoor rafting ride and the level with the special "ballroom of the ancient gallery" ... perhaps now being named or renamed or recalled as something about "Face [of] the Music" lies a magical "mini-maize" ... a look at a mock-up (or #isitit) of Merlink and Harthor's "round table" that displays a series of ... (at least to me) magical appearing holographic displays and controls that my dreams have stolen from Phillip K. Dick's Minority Report and something of what I hope Microsoft's Dynamics/Hololens/Surface will become---a series of short "focus groups" .... to guage and discuss the information in the "CITIES-D5AM-MERCK" ... how to end world hunger and nearly all disease with the press of a magical buzzer--castling churches to something like "political-party-town-hall-meeting centers" and replacing jails and prisons and hospitals with something like the "Hospitalier's PRIDE and DOJOY's I practiced "Kung-fun-dance" ... a fusion of something like a hotel and a school that probably looks very much like a university with classrooms and dorms and dining hall's all fit into a single building. I imagine a series of 2 or 3 "room changes" as in you walk from the one where you get the book and talk about it ... to the one where you talk about "what everyone else said about it" and maybe another one that actually connects you to other people with something like Facebook's Portal; the point of the whole thing to really quickly "rubber stamp" the need for an end to "bars in the sky" nonalcoholic connotation--as in "overcoming the phrase the sky is the limit" and showing us the need for a beacon of glowing hope fulfilled--probably actually the vision of a holographic marker turning into actual rings around the single moon of Earth, the focus of the song annoucing the dawn of the age of Aquarius--- It might lead us also to Ceres; and another set of artificial rings, or to Monoceros and a rehystorical understanding of the birthplace and birthing of the "river roads" that bridge the "space gaps" in the galaxy from our "one giant leap for mankind" linking the Apollo moon landing to the mythological connection to the sun; and connecting how the astrological charts of the ancients might detail a special kind of overlapping--the link between Earth's SOL and something like Proxima or Alpha Centauri; and how that "monostar bridge" might overlap to Orion and from there through Sagitarius and the center of the Milky Way ... all the way to Andromeda and more dreams of being in a place where there's a map to a tri-galactic system in the constellation Cancer and a similar one in Leo ... and just incase you haven't noticed it--a special marker here, I thought to myself it might be cool to "make an acronymic tie to Monoceros" and without even thinking auto-wrote Orion (which was the obvious constellation next to Monoceros, in the charts) and then to Sagitarrius; which is the obvious ... heart of our astrological center and link to "other galaxies." ----I've dreamt or scriven or reguessed numerous times how the Milky Way's map to an "Atlas marked through time by the ages and the ancients" might tie this place and this actual map to the creation of the railways between stars to the beginning and the end of time and of course to this message that links it all to time travel. There's a few "guesses" I've contemplated; that perhaps the Milky Way chart is a metal-cosmic or microcosmic map to the dawn of time in the galactic vision of ... just after the big bang; or it might tie to a map of something like the unthinkable--a civilization that became so powerful it was able to reverse the entropy of "cosmic expansion" and reverse the thing Asimov wrote of in "The Last Question" as the end of life and the ability to survive basically due to "heat loss." "The Last Question." (And if you read two, why not "The Last Answer"?). Find these readings added to our collection, 1,000 Free Audio Books: Download Great Books for Free. https://archive.org/details/texts http://zlibraryexau2g3p.onion.pet/ Looking for free, professionally-read audio books from Audible.com, including ones written by Isaac Asimov? * all "asterisks" in the abovə document denote a sort of Adamic unspoken relationship between notations and meanings; here adding the "Latin word for three" and source of the phrase "t.i.d." (which is doctor/pharmacy latin for "three times a day") where the "t" there is an abbreviation of "ter" ... and suppose the link between K and 11 and 3 noting it's alphanumeric position in the English alphabet as the 11th letter and only linking cognitively to three via the conversion between hex, and binarryy ... aberrative here is the overlapping "hakkasan" style (or ZHIV) lack of mention of the answer in "state of Kansas" and the "citystate of Slovakia" as described in the ICANN document linked [in] the related subsection or slice of the word "binarry" for the state of India. Tetris could be spelled with the addition of only a single letter [in] "tea"---the three letters "ris" are the hearts of the words "Christ" and "wrist" [and arguably of Osiris where you also see the round table character of the solar-system/sun glyph and the chemical element for The Fifth Element (as def. by i) via "Sinbad" and "Superman." The ERIS Free Network should also be mentioned here in connection with the IRC network I associate in the place between skipping stones and sacred hearts defined by "AOL" and "Kdice" in my life. In the lexicon of modern HTML, curly braces are generally relative to "classes" and "major object definitions (javascript/css)" while square brackets generally only take on computer-interpreted meaning in "Markdown" which is clearly (by definition, by this character set "[]") a superset (or at least definately not a subset) of HTML. Dr. Will Caster (Johnny Depp) is a scientist who researches the nature of sapience, including artificial intelligence. He and his team work to create a sentient computer; he predicts that such a computer will create a technological singularity, or in his words "Transcendence". His wife, Evelyn (played by Rebecca Hall), is also a scientist and helps him with his work. Following one of Will's presentations, an anti-technology terrorist group called "Revolutionary Independence From Technology" (R.I.F.T.) shoots Will with a polonium-laced bullet and carries out a series of synchronized attacks on A.I. laboratories across the country. Will is given no more than a month to live. In desperation, Evelyn comes up with a plan to upload Will's consciousness into the quantum computer that the project has developed. His best friend and fellow researcher, Max Waters (Paul Bettany), questions the wisdom of this choice, reasoning that the "uploaded" Just from my general understanding and memory "st" is not ... to me (specifically) an abbreviation of "state" but "ste" is a U.S. Postal code (also "as I understand it") for the name of a special room or set of rooms called a "suite" and in Adamic "connotation" I sometimes read it as "sweet" ... which has several meanings that range from "cool" to "a kind of taste sensation" to "easy to sway or fool." If you asked me though, for instance if "it" was an abbreviation or shorthand notation or acronym for either "a United state" or "saint" ... you'd be sure. While it's clear from studying linguistic cryptography ... (If I studied it a little here and some there, its also from the "universal translator of Star Trek") and the personal understanding that language is a kind of intelligent code, and "any code is crackable" ... that I caution here that "meaning" and "face value" often differ widely and wildly ... even in the same place or among the same group of people ... either varying over time or heritage. Menelaus, in Greek mythology, king of Sparta and younger son of Atreus, king of Mycenae; the abduction of his wife, Helen, led to the Trojan War. During the war Menelaus served under his elder brother Agamemnon, the commander in chief of the Greek forces. When Phrontis, one of his crewmen, was killed, Menelaus delayed his voyage until the man had been buried, thus giving evidence of his strength of character. After the fall of Troy, Menelaus recovered Helen and brought her home. Menelaus was a prominent figure in the Iliad and the Odyssey, where he was promised a place in Elysium after his death because he was married to a daughter of Zeus. The poet Stesichorus (flourished 6th century BCE) introduced a refinement to the story that was used by Euripides in his play Helen: it was a phantom that was taken to Troy, while the real Helen went to Egypt, from where she was rescued by Menelaus after he had been wrecked on his way home from Troy and the phantom Helen had disappeared. https://www.britannica.com/topic/Menelaus-Greek-mythology This article is about the ancient Greek city. For the town of ancient Crete, see Mycenae (Crete). For the hamlet in New York, see Mycenae, New York. Μυκῆναι, Μυκήνη The Lion Gate at Mycenae, the only known monumental sculpture of Bronze Age Greece 37°43′49″N 22°45′27″ECoordinates: 37°43′49″N 22°45′27″E This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols. Mycenae (Ancient Greek: Μυκῆναι or Μυκήνη, Mykēnē) is an archaeological site near Mykines in Argolis, north-eastern Peloponnese, Greece. It is located about 120 kilometres (75 miles) south-west of Athens; 11 kilometres (7 miles) north of Argos; and 48 kilometres (30 miles) south of Corinth. The site is 19 kilometres (12 miles) inland from the Saronic Gulf and built upon a hill rising 900 feet (274 metres) above sea level.[2] In the second millennium BC, Mycenae was one of the major centres of Greek civilization, a military stronghold which dominated much of southern Greece, Crete, the Cyclades and parts of southwest Anatolia. The period of Greek history from about 1600 BC to about 1100 BC is called Mycenaean in reference to Mycenae. At its peak in 1350 BC, the citadel and lower town had a population of 30,000 and an area of 32 hectares.[3] 3. Chew 2000, p. 220; Chapman 2005, p. 94: "...Thebes at 50 hectares, Mycenae at 32 hectares..." https://en.wikipedia.org/wiki/Clymene_(mythology) Melpomene (/mɛlˈpɒmɪniː/; Ancient Greek: Μελπομένη, romanized: Melpoménē, lit. 'to sing' or 'the one that is melodious'), initially the Muse of Chorus, she then became the Muse of Tragedy, for which she is best known now.[1] Her name was derived from the Greek verb melpô or melpomai meaning "to celebrate with dance and song." She is often represented with a tragic mask and wearing the cothurnus, boots traditionally worn by tragic actors. Often, she also holds a knife or club in one hand and the tragic mask in the other. Melpomene is the daughter of Zeus and Mnemosyne. Her sisters include Calliope (muse of epic poetry), Clio (muse of history), Euterpe (muse of lyrical poetry), Terpsichore (muse of dancing), Erato (muse of erotic poetry), Thalia (muse of comedy), Polyhymnia (muse of hymns), and Urania (muse of astronomy). She is also the mother of several of the Sirens, the divine handmaidens of Kore (Persephone/Proserpina) who were cursed by her mother, Demeter/Ceres, when they were unable to prevent the kidnapping of Kore (Persephone/Proserpina) by Hades/Pluto. In Greek and Latin poetry since Horace (d. 8 BCE), it was commonly auspicious to invoke Melpomene.[2] See also [AREXMACHINA] Muses in popular culture The Nine Muses Flagstaff (/ˈflæɡ.stæf/ FLAG-staf;[6] Navajo: Kinłání Dookʼoʼoosłííd Biyaagi, Navajo pronunciation: [kʰɪ̀nɬɑ́nɪ́ tòːkʼòʔòːsɬít pɪ̀jɑ̀ːkɪ̀]) is a city in, and the county seat of, Coconino County in northern Arizona, in the southwestern United States. In 2018, the city's estimated population was 73,964. Flagstaff's combined metropolitan area has an estimated population of 139,097. Flagstaff lies near the southwestern edge of the Colorado Plateau and within the San Francisco volcanic field, along the western side of the largest contiguous ponderosa pine forest in the continental United States. The city sits at around 7,000 feet (2,100 m) and is next to Mount Elden, just south of the San Francisco Peaks, the highest mountain range in the state of Arizona. Humphreys Peak, the highest point in Arizona at 12,633 feet (3,851 m), is about 10 miles (16 km) north of Flagstaff in Kachina Peaks Wilderness. The geology of the Flagstaff area includes exposed rock from the Mesozoic and Paleozoic eras, with Moenkopi Formation red sandstone having once been quarried in the city; many of the historic downtown buildings were constructed with it. The Rio de Flag river runs through the city. Originally settled by the pre-Columbian native Sinagua people, the area of Flagstaff has fertile land from volcanic ash after eruptions in the 11th century. It was first settled as the present-day city in 1876. Local businessmen lobbied for Route 66 to pass through the city, which it did, turning the local industry from lumber to tourism and developing downtown Flagstaff. In 1930, Pluto was discovered from Flagstaff. The city developed further through to the end of the 1960s, with various observatories also used to choose Moon landing sites for the Apollo missions. Through the 1970s and '80s, downtown fell into disrepair, but was revitalized with a major cultural heritage project in the 1990s. The city remains an important distribution hub for companies such as Nestlé Purina PetCare, and is home to the U.S. Naval Observatory Flagstaff Station, the United States Geological Survey Flagstaff Station, and Northern Arizona University. Flagstaff has a strong tourism sector, due to its proximity to Grand Canyon National Park, Oak Creek Canyon, the Arizona Snowbowl, Meteor Crater, and Historic Route 66. #PSANSDISL #LWDISP either without gas or seeing cupidic arroz in "thank you" or "allta, wild" ... pps: a magnanimous decision ... I stand here on the brink of what appears to be total destruction; at least of everything I had hoped and dreamed for ... for the last decade in my life which appears literally to span thousands of years if not more in the eyes of some other beholder. I spent several months in Kentucky telling a story of a post apocalyptic and post-cataclysmic delusion; some world where I was walking around in a "fake plane" something like a holodeck built and constructed around me as I "took a walk around the world" to ... it did anything but ease my troubled mind. Recently a few weeks in Las Vegas, and a similar story; telling as I walked penniless down the streets filled with casino's and anachronistic taxi-cabs ... some kind of vision of the entirety of the heavens or the Earth or the "choir of angels" I think of when I echo the words Elohim and Aesir from mythology ... there with me in one small city in superposition; seeing what was a very well put together and interesting story about a "star port" Nirvane ... a place that could build cities into the face of mountains and half working monorails appearing in the sky---literally right before my eyes. I suppose this is the place "post cataclysm" though I still have trouble understanding what it is that's actually about ... in my mind it connects to the words "we are losing habeas" echo'ed from the streets of Los Angeles in a more clear and more military voice than usual--as I walked block by block trying to evade a series of events that would eventually somehow connect all the way to the "outskirts of Orlando, Florida" in a place called Alhambra. Apparently the name of a castle; though I wasn't aware of that until much later. It doesn't feel at all like a "cataclysm" to me; I see no great rift--only a world filled with silent liars, people who collectively believe themselves to have stolen something--something gigantic--at least that's the best interpretation of the throws and impetus behind the thing that I and mythology together call Jormungandr. With an eye for "mythological connections" you could clearly see that name of the Great Serpent of Revelation connects to something like the Unseelie; the faeries of Gaelic lore. To me though this world seems still somewhat fluid, it's my entire life--moving from Plantation to a place where the whole of it might be Bethlehem and to "clear my throat" it's not hard to see here how that land of "coughs" connects to the Biblical land of Nod and to the "Adamically sieved" Snifleheim ... from just a little twist on the ancient Norse land most probably as close to Hel as anyone ever gets--or so I dream and hope---still today. It all looks so real and so fake at the same time; planned for thousands of generations, the culmination of some grand masterpiece story that certainly ties history and myth and reality into a twisted heap of "one big nothing, one big nothing at all." I've tried to convey to the world how important I believe this place and this time to be--not by some choice of my own ... but through an understanding of the import of our history and the impact of having it be so obviously tuned and geared towards this specific time ... many thousands of years literally all focused on a single moment, on one day or one hour or even just a few years where all of that gets thrown down on the table as if some trump card has been played--and whether or not you fathom the same magnanimous statement or situation or position ... to me, I think it depends on whether or not you grew up in the same kind of way, believing our history to be so fixed and so difficult to change. I don't particularly feel like that's the "zeitgeist" of today; I feel like the children believe it to be some kind of game, and that it is such as easy thing to "sed" away or switch and turn into something else--another story, another purpose ... anyone's personal fantasy land come true. I don't think that's the case at all, it's clearly a personal nightmare; and it's clearly one we've seen time and time again--though not myself--the Jesus Christ that is the same yesterday, today; and once again perhaps echoing "no tomorrow" never remembers or believes that we've "seen it all before" or that we've ever really gotten the point; the thing you present to me as "factual reality" is a sickness, it disgusts me; and I'd do anything to go back to the world "where I was so young, and so innocent" and so filled with starry-eyed hope that we were at the foot of something grand and amazing that would become an empire turned republic of the heavens; filling the stars ... with the kind of love for kindness and fairness that I once associated very strongly with the thing I still believe to be the American Spirit. "Suddenly it changes, violently it changes" ... another song echoes through the ages--like the "words of the prophets dancing ((as light)) through the air" ... and I no longer even have a glimmer of hope that the thing I called the American People still exist; I feel we've been replaced by some broken container of minds, that the sky itself has become corrupt to the point that there's no hope of turning around this thing that I once believed with all my heart and all my mind was so obviously a "designed downward spiral" one that was---again--so obviously something of a joke, intended to be easy to bounce off a false bottom and springboard beyond "escape velocity" and beyond the dark waters of "nearest habitable star systems (being so very far away)" into a place where new words and new ideas would "soar" and "take flight." Here though; I am filled with a kind of lonely sadness ... staring at what appears to be the same mistake(s) happening over and over again; something I've come to call "skipping stones in the pond of reality" and really do liken it to this thing that appears to be the new meaning of "days" and ... a civilization that spends absolutely no love or lust to enter a once sacred and holy place and tarnish it with their sick beliefs and their disgusting desires. You all ... you appear to be some kind of springboard to "bunt" forth yet another age or era of nothingness into the space between this planet and "none worth reaching" and thank God, out of grasp. Today, I'd condemn the entirety of this world simply for it's lack of "oathkeepers" and understanding of what the once hallowed words of Hippocrates meant to ... to the people charged and dharmically required to heal rather than harm. It appears the place and time that was once ... at least destined to be the beginning of Heaven ... has become a "recurring stump" of some future unplanned and tarnished by many previous failed efforts and attempts to overcome this same "lack of conversation or care" for what it meant to be "humane" in a world where that was clearly set high aloft and above "humanity" in the place where they--where we were the best nature had to offer, the sanest, the kindest; the shining last best hope. Today I write almost every day ... secretly thanking "my God" for the disappearance of my tears and the still small but bright hope that "Tearran" will one day connect the Boston Tea Party and the idea that "render to Caesar" and Robin of Loxley ... all have something to do with a re-ordering of society and the worth and import of "money" ... to a place that cares more for freedom from murder than it does ... "freedom from having to allow others to hear me speak." I hold back tears and emotions; not by conscious choice or ability but ... still with that strange kind of lucky awkward smile; and secretly not so far below the surface it's the hope of "a swift death" that ... that really scares me more than the automatons and mechanical responses I see in the faces of many drivers as they pass me on the street--the imagery of connecting it to the serpentine monster of the movie Beetlejuice ... something I just "assume" the world understands and ... doesn't seem to fear (either); as if Churchill had gotten it all wrong and backwards--the only thing you have to fear, is the loss of fear of "loss." Here my crossroads---halfway between the city my son lives in and the city my parents live in--it's on making a decision on whether I should continue at all, or personally work on some kind of software project I've been writing about, or whether I should focus on writing about a "revolution" in government and society that clearly is ... "somewhat underway." In my mind it's obvious these things are all connected; that the software and the governance and the care of whether or not "Babylon" is remembered as a city of great laws and great change or a city of demons and depravity ... that these thi]ngs all hinge and congeal around a change in your hearts; hoping you will chose to be the beginning of a renaissance of "society and civilization" rather than the kings and queens of a sick virtual anarchy ... believing yourselves to have stolen "a throne of God" rather than to literally be the devastating and demoralizing depreciation of "lords and fiefdoms" to something more closely resembled by the time of the Four Horsemen depicted in Highlander. These words intended to be a "forward" to yet another compliment of a ((nother installment of a partial)) chain of emails; whimsically once half-joking ... I called it the Great Chain of Revelation. The software too; part of the great chain, this "idea" that the blockchain revolution will eventually create a distributed and equal governance structure, and a rekindling of monetary value focused on "free and open collaboration" rather than "survival of the most unfit"--something society and civilization seem to have turned the "call of life" from and to ... literally just in the last few years as we were so very close to ... reaching beyond the Heaven(s). I don't think its hard to imagine how a "new set of ground rules" could significantly change the "face of a place" -- make it something shiny and new or even on the other side of the coin, decayed or depraved. It's not hard to connect the kind of change I'm hoping for with "collision protection" and "automatic laws" to the (perhaps new, perhaps ... ancient) Norse creation story of the brothers of Odin: Vili and Ve. It might be hard to see today how a new "kind of spiritual interaction" might be only a few "mouse clicks" away though--how it could change everything literally in a flash of overnight sensation ... or how it might take something like a literal flash of stardom (or ... on the other hand, something like totalitarian or authoritarian "iron fisting") to make a change like this "ubiquitious" or ... something like the (imagined in my mind as ... messianic) "ED" of storming through the cosmos or the heavens and turning something that might appear to be "free and perfect feeling" today into a universe "civlized overnight" and then ... I wonder how long it would take to laud a change like that; for it to be something of a voluntary "reunderstanding" of a process ... to change the meaning of every word or every thought that connects to the process of "civilization" to recognize that something so great and so powerful has happened as to literally change the meaning of the word, to turn a process of civilization into something that had a ... "signta-lamcla☮" of forboding and then a magical staff struck into the heart of a sea and then ... and then the word itself literally changes to introduce a new "mid term" or "halfway point" in which a great singularity or enlightenment or change in perspective or understanding sort of acknowledges ... that some "clear outside" force not only intervened on the behalf of the future and the people of our world but that it was uniquely involved in the whole of-- "waking up" tio a nu def of #Neopoliteran. ^Like the previous notation; the below text comes from an email previously sent; and while i stand behind things like my sanity, my words; and my continued and faithful attempt to speak and convey both a useful and helpful truth to the world---sometimes just a single day can make all the difference in the world. Sometimes it's just a single moment; a flash or a comment about ^th@ blink of an eye" ... and I've literally just "thought up/had/experienced/transitioned thru" that exact moment. The lies standing between "communication" and either "cooperation" or .... some other kind of action have become more defined. More obvious. Because of this clarification; like a kind of "ins^tant* gnosis" ... search high and lo ... the depths all the way to above the heavens ... for a festive divorce ceremonial ritual ... that looks something like a bachelor party ':;] — @amrs@koyu.SPACe ... @suzq@rettiwtkcuf.social (@yitsheyzeus) May 22, 2020 I ... TERON; Gjall are painting me into a corner here; and I don't see around it anymore--I don't see the light, and I don't see the point. I was a happy-go-lucky little kid in my mind; that's not "what I wanted to be" or what I wanted to present, it's who I was. I saw "Ashkenazi" and ... know I am one of those ... and I kind of understood that something horrible might have happened, or might happen here--and I kind of understand that crying smashing feeling of "to ash" that echoes through the ages in the potpourri songs about pockets full of Parker Posey .. and ancient Psalms about "from the ashes of Edom" we have come--and from that you can see the cyclical sickness of this ... place so sure it's "East of Eden" and yet gung-ho on barrelling down the same old path towards ash and towards Edom and towards ... more of Dave's "ashes to ashes dust to dust" and his "smoke clouds roll and symphony of death..." and few words of solace in a song called Recently that I imagine was fleeting and has recently come and gone--people stare, I can't ignore the sick I see. I can't ignore his "... and tomorrow back to being friends" and all but wonder who among us doesn't realize it's "ash" and "gone" and "no memory of today" that's the night between now and ... a "tomorrow with friends" not just for me--but for all of you--for this place that snickers and pantomimes some kind of ... anything but "I'm not done yet" and "there's more ... vendetta ... and retribution to be had, Adam ... please come back in a few more of our faux-days." This is sickness; and happy-go-lucky Himodaveroshalayim really doesn't do much but complain about that word, the "sickle" and the tragic unavoidable ... ash of it all ... these days--you'd think we could "pull out" of this mess, turn another way; smile another day, but it seems there's only one way to get to that avenu in the mind of ... "he who must not know or be me." I have to admit I found some joy in the epiphany that the hidden city of Zion and it's fusion with the Namayim' version of how that "Ha" gels and jives with the name Abraham and the Manna from Heaven and the bath salt and the tina and the "am in e" of amphetamine--maybe a glimmer or a shimmer or a glow of hope at the moment "Nazion" clicked ... and I said ... "no, not me ... I'm nothing like a king, no dreams of authoritarianism at all in the heart of Kish@r;" even as I wrote words that in the spirit of the moment were something of a "tis of a'we" that connected to my country and the first sing-songy "tisME" that I linked to trying to talk in the rhyming spirit of some "first Christ" that probably just like me was one limmerick away from the end of the rainbow and one "Four Non Blondes" song away from tying "or whatever that means" and this land crowned with "brotherhood" (to some personal "of the Bell, and of the bell towers so tall and Crestian") to just one Hopp skip and jump away from the heart of the obvious echoes of a bridge between haiku and Heroku... a few more gears shift into place, a click and and a mechanical turn of the face of the clock's ku-ku striking ... it was the word "Earthene" that was the last "Jesusism" around the post Cimmerian time linking Dionysus and Seuss to that same "su-s" that's belonging to a moment in the city of Uranus--codified and etched in stone as "MCO"--not just for its saucer and warp nacelles and "deflector dish" but for it's underground caverns and it's above ground "Space Mountain" and that great golf ball in the heart of it all. The gears of time and the dawns of civilizequey.org query the missing "here" in our true understanding of what "in the beginning, to hear; to here ... to rue the loss of the Maize from Monoceros to the VEGA system and the tri-galactic origin of ... "some imaginary universal ... Earthene pax" to have dropped the ball and lost it all somewhere between "Avenu Malkaynu" and melaleuca trees--or Yggrasil and Snifleheim--or simply to miss the point and "rue brickell" because of bricks rather than having any kind of love or nostalgia linking to a once cobblestone roadway to the city in the Emerald skies paved in golden "do not return" signs ... to have lost Avenues well after not realizing it was "Heaven'es that were long gone far before I stepped foot on this road once called too Holy for sandals" in a place where that Promised Land and this place of "K'nanites" just loses it's grip on reality when it comes to mentioning the possibility that the original source and story of Ca'anan was literally designed to rid the world of ... "bad nanites" and the mentality of ... vindictiveness that I see behind every smirk. The final hundred nanoseconds on our clock towards doom and gloom cause another bird to fly; another snake to curl up and listen again to the songs designed to charm it into oblivion; whether that's about a club in South Beach or a place not so far from our new "here..." all remains to be seen in my innocent eyes wondering what it truly is that stands between what you are ... and finding "forgiveness not needed--innocent child writes to the mass" ... and the long arm of the minute hand and the short finger of the hour for one brief moment reconcile and move towards "midnight" together; and it's simply idyllic, the Nazarene corner between nil and null you've relegated the history of Terran poast futures into ... "foreves mas" or so they (or you) think. I'm still so far from "Five Finger Death Punch" though; and so far from Rammstein and so far from any kind of sick events that could stand between me and "the eternal" and change my still "casual alternative rock" loving heart to something more death metal; I rue whatever lies between me and there being any kind of Heaven that thinks there could exist a "righteous side" of Hell and it... simultaneously. I still see light here in admonishing the masses and the angels standing against the story and the message God brings us in our history. I still see sparks in siding with the "causticness" of "no holodecks in sight" and the hunger and the pain of simulating ... "the hells of reality" over the story of decades or centuries of silence refusing to see "holography" and "simulated" in the word Holocaust and the horrors of this place that simply doesn't seem to fathom or understand the moments of hunger pangs and the fear of "dark Earth pits" or towers of "it's not Nintendo-DS" linking the Man in the High Castle to an Iron Mask. I rally against being what I clearly am raised high on some pedestal by some force beyond my comprehension and probably beyond that of the "perfect storm in time" that refuses to itself acknowledge what it means to gaze at such an unfathomable loss of innocence at the cost of a "happy and serene future" or even at the glimmer of the Never-Never-Land I'd hoped we would all cherish and love and share ... the games and the newfound freedom that comes not just from "seeing Holodeck" turn into "no bullets" and "no cages" but into a world that grows and flourishes into something that's so far beyond my capability to understand that I'm stuck here; dumbfounded; staring at you refusing to stop car accidents and school shootings ... because "pedestal." For the "fire and the glory" of some night you refuse to see is this one--this place where morality rekindles from ... from what appears tobe one small candle, but truly--if it's not in your heart, and it's not coming from some great force of goodness--fear today and a world of "forever what else may come." Here in a place the Bible calls Penuel at the crossing of a River Jordan ... the Angel of the Lord notes the parallels in time and space between the Potomac and the Rhine--stories of superposition and cities and nation-states that are nothing more than a history of a history of things like the Monoceros "arroz" linking not just to the constellation Orion but to Sagittarius and to Cupid and of course to the Hunter you know so well-- Searching for a Saturday; a sabbath to be made Holy once more ... "at the Rubycon" The Einstein-Rosen Wormhole and the Marshall-Bush-JFKjr Tunnel The waters are called narah, (for) the waters are, indeed, the offspring of Nara; as they were his first residence (ayana), he thence is named Narayana. — Chapter 1, Verse 10[3] In a semi-fit of shameless arexua-self recognition i'm going to mention Amazon's new series "Upload" and connect it to the PKD work that my Martian-in-simulcrum-ciricculum-vitae on "colonization education" ... tying together Transcendance, Total Recall and ... well; to be honest it actually gave me another "uptick" in the upbeat ... maybe i'll stick around until I'm sure there's at least one more copy of me in the ivrtual-invverse ... oh, that reminds me ... Farmer)'s Lord of Opium also touches on this same "mind of God in the computer" subject (which of course leads to Ghost in the Shell and Lucy--thanks Scarlette :). While I'm listing Matrix-intersected pieces of the puzzle to No Jack City, Elon Musk's neuralace and Anderson's Feed are also worth a mention. Also the first link in this paragraph is titled ... "the city of the name of time never spoken after time woke up and stfu'd" (which of course is the primary subject of this ... update to the city Aerosol). The ... "actual original typed dream" included a sort of "roller coaster ride" through space all the way to Mars; where the real purpose of "the thing" I am calling the "Mars Hall" was to display previous victories and failures ... and the introduction of "older or future" culture's suggestions for "the right way" to colonize a new habitat. If it were Epcot Center, this would be something like SpaceMountain taking you to to the foture of "Epcot Countries" as if moving from "countries" to planets were as easy as simply ... "reading backwards." THE SOFTWARE, SINGERS, AND SHIELD(S) OF HEIROSOLYMITHONEYY Thinking just a little bit ahead of myself, but I'm on "Unreal Object/Map Editor within the VR Server" and calling it something like "faux-wet-ware" ... which then of course leads to a similar onomonopeia of "weapons and ..." where-with-all to find a better singer's name to connect the road of "sword" to a Wo'riordan ... but I think that fusion of warrior and woman probably does actually say ... enough of it all; on this road to the living Bright Water that the diety in my son's middle name defines well here, as "waking up," stretching it's tributaries and it's winding wonders and wistfully .... Narayana (Sanskrit: नारायण, IAST: Nārāyaṇa) is known as one who is in yogic slumber on the celestial waters, referring to Lord Maha Vishnu. He is also known as the "Purusha" and is considered the Supreme being in Vaishnavism. andromedic; the ports of call ... to the mediterranean (literally) from the gulf coast; ... ho engages in the creation of 14 worlds within the universe as Brahma when he deliberately accepts rajas guna, himself sustains, maintains and preserves the universe as Vishnu by accepting sattva guna. Narayana himself annihilates the universe at the end of maha-kalp ... . there's no place like home. there's no place like home. there's no place like home. and so it begins ... "f: r e l i g i o n find out what it means to me. faucet, ever single one, stream of purity ... from Fort Myers ... f ... flicks ... Flint. " ^this notation will from this email forward in linear time denote some form of contact method or information related to the context of the message you are reading. This particular one sends me an encrypted email. 5if there is an "@" symbol involved in the "anchor's hypertext reference" (technically an "a href=" in HTML4) your browser should attempt to open an email client to send a message over an anonymous SMTP relay. Understand that "anonymous" in this case may or may not mean your sending email address is hidden or obvuscated--so if you want to receive a reply you must include it in the DATA of your SMTP transmission defined by the RFC5321 attached. In most cases "anonymous" also means that you will not have the recipients direct contact information unless they have made it public---additionally the exact server/system/relay used may or may not be the "Sbroken Berkman Perl Script" linked to in the "hypertext reference" specifically anchored to the words "an anonymous SMTP relay" above. A simple "hat character" (^) and the letter "t" as you see beginning the above paragraph will denote a contact method or form that works over the internet using an HTTP protocol defined in a series of RFC's including (but not limited to) RFC's numbered as 2616, 7230, 7235, 2068 and use a simple language which is based on a definition suggested or proposed currently by an organization called the "W3C Consortium" ---and ... previously set and defined by an organiza^tion located at html.spec.whatwg.org; which appears (to me, for the first time as I write these words) to follow the conceptual spirit of the "living document" defined by the several "Continental Congresses, et alia." I personally now conjoin this document in my head to a procession of patrilineal or matrilnear predecessors to the actual event .... still to be defined ... but related to this specific email, this mailing list; its contributors and readers as well as actual members of the organization (still to be created, defined, or named) that creates a "round table*" of members that is open to the public, to all voters educated enough to understand the specific issue being voted on (up to a standard that; in this place and time appears to be unset and unmet but materially related to reawching the age of 18 years old; growing up in or being born in the United States of America (related spec.* to the Constitution of the United States of America which is officially "self-defined" through a process which includes all three branches of the government which it also "self-defines" and purports to be "of, for, and by the people"--though the general population is only able to contribute through an indirect process (read:the people cannot directly contribute to the constitution without either running for office (like a senator) or being appointed to a specific government position (like a judge or executive branch public servant). The current state of American representative democracy is the highest standard to which I am currently knowledgable of "extant*"--and it is specifically substandard, inferior, and "just not good enough" as a comparison to the process required to vote in the organization being "self-defined" through this process. It is my sincere and clear hope that "this process" will result in a legal and moral amendment to the document shown in the previous link and presented by the Legislative Branch of the United States here. It is my current and faithful belief that anything else would also be significantly below the standards morally required by "this process" which of course includes over 200 years of American citizenship and (other international relations; i.e., e.g, for "iv" example, id est, exemplia gratia) as well as the Sons of Liberty and prior to that contributions from the Crown and the "Parliament and Crown" of the United Kingdom; among others et alea's ifndef: 'swikipedia/et_al.. To note specifically because of lack of personal knowledge and public notoriety (assuming all other requiremnant* achem requirements) alas, babylon. i listened to a man yesterday who was talking about "true heroes" ... he of course noted jesus christ and superman together, suggesting the first was one, and the second just a fiction. he also talked about people like ghandi and "leaders who use non-violent means to "change the world." i at least agree with him on the third, ghandi is a good prototype for some kind of hero. staring at this ... "to be completed" work on tales of two cities, whether from sodom and gomorrah all the way to athens and sparta and perhaps even london and paris--and this particular city, babylon; it stands out as one which truly has no equal or even "mirror" in the history of the world. i suppose i'd add "alexandria" and suggest the library and the laws; something that are fundamental to the ethos of the planet i call "athens." i imagine he did not know "hammurabi's" name; and even today in this place where i ask and do not receive answers; i imagine you still don't connect muhammad or amsterdam ... to this king who in our history is set apart and lifted high on a pedestal of having "codified and written down" laws ... for the very first time. it's almost comical, it took me a paragraph and a sentence to connect "the king and i" to this mirror world, where the bible and the people have most assuredly decided "babylon" is a negative thing or a depraved place. "fallen, fallen, is [the city of] babylon the great" ... just a quote from one of my favorite movies; which of course is re-quoting "dante" and/or "the bible" "a dwelling place [of] (the) demons (say), it has become." www.icann.org/news/blog/the-problem-with-the-seven-keys kauri on IPFS: has-abaslom-and-the-ethos-of-arcadia

      12:3 Those who are wi se[a] will shine like the brightness of the heavens, and those who lead many to righteousness, like the stars for ever and ever.

      you are offline

      we the people rise again

      safe souls, safe fu


      We the People of Slate ...

      The U.S. Constitution, as you [mighta been, shoulda "come" on ... its somedayrewrϕte it.

      "Politicians talk about the Constitution as if it were as sacrosanct as the Ten Commandments [interjection: spec. it is actually almost exactly related!]. But the document itself invites change and revision. What if the president served only one six-year term instead two four-year terms? What if your state's population determined how many senators represent it? What if the Constitution included a right to health care? We asked legal scholars and Slate readers to cross out what they didn't like in the Constitution and pencil in their hearts' desires. Here's what the document would look like with their best ideas."

      多也了了夕 "with a ~~wand~~ of scheffilara, 并#亦太 he begins ... "I am now on the Staff of Menelaus, the Spears of Longinus and Lancelot; and the name "Mosche ex Nashon."

      Logically the recent mentions of Gilgamesh and the simultaneous 同時 overlaping 場道 of the eventual link between the famous ruling of Solomon on the separation of babies and mothers and waters and land ... to a story of many "two cities" that culminates in a cultural or societal or "evolutionary" link to Sodom and Gomorrah and the city-state of Babylon (and it's Hanging Gardens) and also of course to Paris and Troy and "Masstodon" and city-states [ciudadestado] and perhaps planet-cities; from Cambridge to Cambridge across the "Cable" to see state to "London" ... recently I called it "the city of realms" ... I started out logically intending to link "game theory" and John Nash to the mathematical story of Sputnik and a revival of American physics; but in my usual way of rambling into the woods [I mean neighborhood] of stream of consciousness ... turned into a premonitory discourse of "two cities" and how sometimes even things as obvious as the number of letters in the word "two" don't do a good enough job of conveying ... how and/or why one is simply never enough, and two isn't much better--but in the end a circle ... is drawn; the perfect circle in our imaginary mathematical perfection ... I see a parted "line" in the letter pronounced "tea" (and beginning that word); and two "vee" (pron. of "v") symbols joined together in a word we pronounce as "double-you" ... and symbolically because I know "V" is the Roman Numeral for 5 (five) and I know not how to multiply in Roman numerals--

      It's important to pause; here. I am going to write a more detailed piece on "the two cities" as I work through this maze like crossroads between "them" and "demo..." ... here demorigstrably I am trying to fuse together an evolutionary change in ... lit. biological evolution as well as an echelon leap forward in "self-government" ... in a place where these two things are unfathomable and unspokenly* connected.

      To a question on the idiom; is Bablyon about "the law" or "of the land of Nod?"

      "What is democracy" ... the song, Metallica's "ONE" echoes and repeats; as we apparently scrive together the word "THEM" ... I question myself ... if Babylon were the capital city of some mythical Nation of Time ... if it were the central "turning point" of Sheol; ... >|<

      Can you not see that in this place; in a world that should see and does there is a gigantic message proving that we are not in reality and trying to show us how and why that's the best news since ... ever---that it's as simple as conjoining "the law of the land" with a basic set of rules that automatically turn Hell into something so much closer to Heaven I just do not understand---why we cant stand up together and say "bullets will not kill innocent children" and "snowflakes will not start avalanches ...." that cover or bury or hide the road from Earth to Verital)e .... or from the mythical Valis to Tanis---or from Rigel to Beth-El ... "guess?"

      ## as "an easy" answer; I'm looking for a fusion of "law and land" that somehow remembers a "jok'er a scene" about "lawn" seats; and "where the girls are green;"

      It's as simple as night and day; Heaven and Hell ... the difference between survival and--what we are presented with here; it's "doing this right"--that ends the Hell of representative democracy and electoral college--the blindness and darkness of not seeing "EXTINCTION LEVEL EVENT" encoded in these words and in our governments foundation ... *by the framers [not just of the USA; but English .. and every language] *

      ... is literally just as simple as "not caring" or thinking we are at the beginning of some long process--or thinking it will never be done--that special "IT" that's the emancipation of you and I.

      Here words like "gnosis" and "gaudeamus" pair with my/ur "new ntersanding*" of the difference between Asgard and Medgard and really understanding our purpose here is to end "evil" ... things like "simulating disease and pain" (here, simulating meaning ... intentionally causing, rather than "gamifying away") and successfully linking the "Pillars of Hercules" to Plato's vision of Atlantis and the letter sequences "an" and "as" ... unlock a fusion of religion and mythology and "cryptographic truth" that connects "messianic" and "Christian" to "Roman" ... "Chinese" and "American" ... literally the key to the difference between the phrases "we are" and "we were" ....

      in "sight" of "silicon" in simulation and Israel, Genesis, and "silence" ... trying to the raising of Asgardian enlightenment ... and seeing "simple cypher" connecting to "Norse" ...

      and the "I AM THAT" surer than shit ... the intention and design of all religion and creation is to end "simulated reality" and also not seeing "SR" ... in Israel and Norse ... "for instance."

      It's a simple linguistic concept; the "singularity" and the "plurality" of a simple word--"to be"--but it goes to the heart of everything that we are and everything that is around us. This is a message about understanding and preserving individuality as well as liberty; and literally seeing "ARXIV" and understanding "often" and failing to connect God and prescience to "IV" and the Fourth Amendment ... it's about blindness and ... "curing the blind instantly" ... and fathoming how and why this message has been etched into our entire history and and all religions and myths and music--to help us "to be THAT we" that actually "are responsible" for the end of Hell.

      • I neglected to mention "Har-Wer" and "Tower of Babel" which are both related lingusitically, religiously and topically: "to who ..." and while we're on "four score and [seven years from now]" seeing the fourth "living thing" in Eden and it's (the name, Abel) connection to Babel and Abraham Lincoln; slavery and ... understanding we live in a place where the history of the United States also, like Monoceros and "Neil Armstrong's first step" are a time shifted ... overlayed map to achieving freedom ... it's about becoming a father-race ... and actually "doing" the technological steps required to "emancipate the e's of 'me&e'" and survive in exo-planetary space---

      it might be as simple as adding "because we did this" here and now; and having it be something we are truly proud of .... forevermore™ ... for certain in the heart of this story about cyclicality and repetition of error--its not because we did "this" or something over and over again; it's about changing "the problem" and then helping others to also overcome ... "things like time travel ... erasing speech" --- however that happenecl.

      • I also failed to mention that "I am in Hell" ... as in this world is hellacious to me; in an overlay with the Hellenic period and this message that we are in the Trojan Horse ... a small gem .... "planet" truly is the Ark of the Covenant---and it's the simple understanding that "reality is hell" is to "living without air conditioning and plumbing is hell" just as soon as you achieve ... "rediscovering" those things---

      • I can't figure out why I am the only person screaming "this is Hell." That's also, Hell.

      ... but recently suggested an old joke about "there being 10 kinds of people in the world (obv an anti-tautology and a tautology simultaneously)" only after that brief bit of singularity and duality mentioning the rest of the joke: "those that understand binary and those that don't know how to base convert between counting with two hands and counting with only an 'on and off.'" It's not obvious if you aren't trying to figure it out, I suppose; but 10 is decimal notation for "kiss" and the "often" without "of" ... and binary notation for the decimal equivalent of "2." A long long time ago in a state that simply non-randomly ties to the heart of the name of our galaxy ... I was again thinking of the "perfect imperfections" of things like saying "three equals one equals one" (which, of course was related to the Holy Trinity and it's "prescient/anachronistic Adamic presence encoded in the name Ab|ra|ha|m" which means "father of a great multitude") ... I brought that one back in the last few months; connecting the letter K and in this "logos-rythmic" tie to the "base of a number system" embellish the truth just a bit and suggest a more accurate rendition of the original [there is no such thing as equality, "is" of separate objects--as in no two snowflakes are the same unless they are literally the same one; true of ancient weights and with the advent of (thinking about) time no two "planets" are the same even if they're the exact same one--unless it's at a fixed moment in time.

      K=3:11 ... to a handle on the music, the DHD of the gate and the *ring of David's "sling" ...

      ---and that's a relationship of "3 is to 11" as [the SAT style "analogy)]y" as a series of alpha, two mathematic, and two numeric symbols ... may only tie in my mind alone to the books of Genesis and Matthew and the phrase "chapter and verse" and to the stories of Lot and Job ... again in Genesis and the eponymous "Book of Job." So ... "tying up loose ends one 10b [III] iv. " as it appears I've taken it upon myself to call a Job and suggest is my "Lot in life [x]i* [3]"

      • I worry sometimes that important things are missing, or will disappear---for instance Mirriam Webster, which is a "canonical/standard dictionary) should probably have an entry for "lot in life" non-idiomatically as "granny apples to sour apples" as

      2 MANY ALSO ICI; 1two ... following in Mitnick's bold introductory word steps; the curve and the complement ... the missiles and the canoes; the line and the blank space ... "supposedly two examples of two kinds, which could be three not nothings ... Today I write about something monumental; as if as important as the singularity depicted in Arthur C. Clarke's 2001 "A Space Odyssey" ... and remember a day when I thought it very novel and interesting to see the words "stillborn and yet still born" connected in a single piece of writing to "Stillwater and yet still water" ... today adding in another phrase noting the change wrought only by one magical single "space" (also a single capital letter; and a third phrase): "block chains with a great blockchain."

      • https://en.wikipedia.org/wiki/EuripidesIphigenia in Aulis or Iphigenia at Aulis[1] (Ancient Greek: Ἰφιγένεια ἐν Αὐλίδι, Iphigeneia en Aulidi; variously translated, including the Latin Iphigenia in Aulide) is the last of the extant works by the playwright Euripides. Written between 408, after Orestes, and 406 BC, the year of Euripides' death, the play was first produced the following year[2] in a trilogy with The Bacchae and Alcmaeon in Corinth by his son or nephew, Euripides the Younger,[3] and won first place at the City Dionysia in Athens.

      • The play revolves around Agamemnon, the leader of the Greek coalition before and during the Trojan War, and his decision to sacrifice his daughter, Iphigenia, to appease the goddess Artemis and allow his troops to set sail to preserve their honour in battle against Troy. The conflict between Agamemnon and Achilles over the fate of the young woman presages a similar conflict between the two at the beginning of the Iliad. In his depiction of the experiences of the main characters, Euripides frequently uses tragic irony for dramatic effect.

      J.K. Rowling spurred just this past week a series of explanations about just exactly what is a blockchain coin worth ... and why is it so; her final words on the subject (artistic liberty taken, obviously not the last she'll say of this magic moment) "I don't think I trust this."

      Taken directly from an off the cuff email to ARXM titled: "Slow the S is ... our Hypothes.is"

      I imagine I'll be adding some wiki/ipfs stuff to it--and try to keep it compatible; the design and layout is almost exactly what I was dreaming about seeing--as a "first rough draft product." Lo, and behold. It's been added to the many places I host my tome; the small compilation of nearly every important email that has gone out ... all the way back to the days of the strange looking Margarita glass ... that now very much resembles the "Cantonese character 'le'" which I've come to associate with a "handle" on multiple corners of a room--something like an automatic coat rack conveyor belt connecting different versions of "what's in the box." I'm planning on using that symbol 了 to denote something like multiple forks of the same page. Obviously I'm thinking forward to things like "the Transhumaist Chain Party" (BDSM, right?)'s version of some particular piece of legislation, let's say everything starts with the sprawling "bulbing" of "Amendment M" ideas and specific verbiage ... and then we'll of course need some kind of new git/subversion/cvs style version control mechanism to merge intelligently into something that might actually .... really should ... make it into that place in history--the first constitutional amendment ratified by a "Continental Congress of All People" ... but you could also see it as an ongoing sort of forking of something like the "wikipedia page" on what some specific term, say "technocracy" means, and how two parties might propagandize and change the meaning of such thing; to suit the more intelligent and wise times we now live in. For instance, we might once have had a "democracy" and a "democractic" party that had some Anarchist Cook Book version of the history of it ending in something like Snipes and Stallone's "DEMOLITION MAN."

      Just kidding, we all know "democracy" has everything to do with "d is cl ... and not th" ... to be the them that is the heart of the start of the first true democracy. At least the first one I've ever seen, in my old "to a republic" ... style. As it is you can play around with commenting and highlighting and annotating all the stuff I've written and begged and begged for comments on--while I work on layering the backend to to perma-store our ideas and comments on both a blockchain (probably a new one; now that i've worked a little with ethereum) with maybe some key-merkle-tree-walk-search stuff etched into the original Rinkeby ... and then of course distributed data in the "public owned and operated" IPFS. To be clear, I plan on rewriting the backend storage so that we will have a permanent record of all comments; all versions of whatever is being commented on; and changes/revisions to those documents--sort of turning the web into a massive instant "place of collaboration, discussion, and co-authoring" ... if you use the wonderful LEGO pieces that have been handed to us in ideas from places like me, lemma--dissenter, and of course hypothes.is who has brought you and i such a polished and nice to look at "first draft" of something like the living Constitution come repository of all human knowledge. I do sort of secretly wich they would have called this project something like "annotating and reflecting (or real or ...) knowledge" just so the movement could have been called ARK. ... or something .... but whatever join the "calling you a reporter" group or ... "supposedly a scientist?"

      NOIR INgR .. I CITE SITE OF ENUDRICAM; a rekindling of the dream of a city appearing high above in the sky, now with a boldly emblazened smiling rainbow and upsidown river ... specifically the antithesis of "angel falls," there's a lagoon too--actually a chain of several ponds underneith the floating rock ... and in some versions of this waking dream there are rings around the thing; you might imagine an artificial set of centripetal orbitals something like a fusion of the ring Eslyeum and the "Six-Axis ride" of the JKF Center's "Spacecamp." I write as I dream, and though I cannot for certain explain exactly how; it's become a strong part of my mythology that this spectacular rendition of "what ends the silence" has something to do with the magical delivery of "a book" ... something not of this Earth but an unnatural thing; one I've dreamt of creating many times. This book is something like the DSM-IV and something like a Merck diagnostic manual; but rather than the old antiquated cures of "the Norse Medgard" this spectacle nearly "itsimportant" autoprints itself and lands on something like every doorpost; what it is is a list of reasons why "simply curing all disease" with no explanation and no conversation would be a travesty of morality--how it would render us half-blind to the myriad of new solutions that can come from truly understanding why "ITIS" to me has become a kind of magical marker: an "it is special" as in, it's cure could possibly solve a number of other problems.

      Through that missing "o," English on the ball, we see a connection between a number of words that shine bright light including Exodus itself which means "let there be light," the word for Holy Fire and the Burning Bush.. .reversed to hSE'Ah, and a story about the Second Coming parting our holy waters.

      This answer connects the magical Rod's of Aaron in Exodus and the Iron Rod of Jesus Christ to the Sang Rael itself... in a fusion that explains how the Periodic Table element for Iron links not just to Total Recall and Mars, but also to this key

      my dream of what the first day of the Second Coming might be like; were the Rod of Christ... in the right hands. In a story that also spans the Bible, you might understand better how stone to bread and your input make all the difference in the world between Heaven and Adam's Hand. Once more, what do you think He ....

      Since the very earliest days of this story, I have asked for better for you, even than see

      Nearly all of the original parts of the original "post-origination dream" remain intact; there's a walkway that magically creates new paths and "attractions" based on where you walk, something like an inversion of the artificial intelligence term "a random walk down a binary tree" ... for instance going left might bring you to the Internet Cafetornaseum of the Earl of Sandwich; and going to the right might bring you to the ICIMAX/Auditorium of Science and Discovery--there's a walkway to "Magical GLAS D'elevators" that open a special "instantiation" of the Japan Room of the Potter and the Toolmaker ... complete with a special [second level and hidden staircase] Pool of Bethesdaibo verily delivering something like youth of mind and body ... or at least as close to such a thing as a sip of Holy Water or Ambrosia or a dip in the pool of Coccoon and Ponce De'Leon could instantly bring ... to those that have seen Jupiter Ascending ... the questions of "nature versus nurture" and what it means to be "old and wise" and "young at heart" truly mean---

      Somewhere between the outdoor rafting ride and the level with the special "ballroom of the ancient gallery" ... perhaps now being named or renamed or recalled as something about "Face [of] the Music" lies a magical "mini-maize" ... a look at a mock-up (or #isitit) of Merlink and Harthor's "round table" that displays a series of ... (at least to me) magical appearing holographic displays and controls that my dreams have stolen from Phillip K. Dick's Minority Report and something of what I hope Microsoft's Dynamics/Hololens/Surface will become---a series of short "focus groups" .... to guage and discuss the information in the "CITIES-D5AM-MERCK" ... how to end world hunger and nearly all disease with the press of a magical buzzer--castling churches to something like "political-party-town-hall-meeting centers" and replacing jails and prisons and hospitals with something like the "Hospitalier's PRIDE and DOJOY's I practiced "Kung-fun-dance" ... a fusion of something like a hotel and a school that probably looks very much like a university with classrooms and dorms and dining hall's all fit into a single building. I imagine a series of 2 or 3 "room changes" as in you walk from the one where you get the book and talk about it ... to the one where you talk about "what everyone else said about it" and maybe another one that actually connects you to other people with something like Facebook's Portal; the point of the whole thing to really quickly "rubber stamp" the need for an end to "bars in the sky" nonalcoholic connotation--as in "overcoming the phrase the sky is the limit" and showing us the need for a beacon of glowing hope fulfilled--probably actually the vision of a holographic marker turning into actual rings around the single moon of Earth, the focus of the song annoucing the dawn of the age of Aquarius---

      It might lead us also to Ceres; and another set of artificial rings, or to Monoceros and a rehystorical understanding of the birthplace and birthing of the "river roads" that bridge the "space gaps" in the galaxy from our "one giant leap for mankind" linking the Apollo moon landing to the mythological connection to the sun; and connecting how the astrological charts of the ancients might detail a special kind of overlapping--the link between Earth's SOL and something like Proxima or Alpha Centauri; and how that "monostar bridge" might overlap to Orion and from there through Sagitarius and the center of the Milky Way ... all the way to Andromeda and more dreams of being in a place where there's a map to a tri-galactic system in the constellation Cancer and a similar one in Leo ... and just incase you haven't noticed it--a special marker here, I thought to myself it might be cool to "make an acronymic tie to Monoceros" and without even thinking auto-wrote Orion (which was the obvious constellation next to Monoceros, in the charts) and then to Sagitarrius; which is the obvious ... heart of our astrological center and link to "other galaxies."

      ----I've dreamt or scriven or reguessed numerous times how the Milky Way's map to an "Atlas marked through time by the ages and the ancients" might tie this place and this actual map to the creation of the railways between stars to the beginning and the end of time and of course to this message that links it all to time travel. There's a few "guesses" I've contemplated; that perhaps the Milky Way chart is a metal-cosmic or microcosmic map to the dawn of time in the galactic vision of ... just after the big bang; or it might tie to a map of something like the unthinkable--a civilization that became so powerful it was able to reverse the entropy of "cosmic expansion" and reverse the thing Asimov wrote of in "The Last Question" as the end of life and the ability to survive basically due to "heat loss."

      "The Last Question." (And if you read two, why not "The Last Answer"?). Find these readings added to our collection, 1,000 Free Audio Books: Download Great Books for Free.

      Looking for free, professionally-read audio books from Audible.com, including ones written by Isaac Asimov?

      * all "asterisks" in the abovə document denote a sort of Adamic unspoken relationship between notations and meanings; here adding the "Latin word for three" and source of the phrase "t.i.d." (which is doctor/pharmacy latin for "three times a day") where the "t" there is an abbreviation of "ter" ... and suppose the link between K and 11 and 3 noting it's alphanumeric position in the English alphabet as the 11th letter and only linking cognitively to three via the conversion betweehex, and binarryy ... aberrative here is the overlapping "hakkasan" style (or ZHIV) lack of mention of the answer in "state of Kansas" and the "citystate of Slovakia" as described in the ICANN document linked [in] the related subsection or slice of the word "binarry" for the state of India. Tetris could be spelled with the addition of only a single letter [in] "tea"---the three letters "ris" are the hearts of the words "Christ" and "wrist" [and arguably of Osiris where you also see the round table character of the solar-system/sun glyph and the chemical element for The Fifth Element (as def. by i) via "Sinbad" and "Superman." The ERIS Free Network should also be mentioned here in connection with the IRC network I associate in the place between skipping stones and sacred hearts defined by "AOL" and "Kdice" in my life. In the lexicon of modern HTML, curly braces are generally relative to "classes" and "major object definitions (javascript/css)" while square brackets generally only take on computer-interpreted meaning in "Markdown" which is clearly (by definition, by this character set "[]") a superset (or at least definately not a subset) of HTML.

      Dr. Will Caster (Johnny Depp) is a scientist who researches the nature of sapience, including artificial intelligence. He and his team work to create a sentient computer; he predicts that such a computer will create a technological singularity, or in his words "Transcendence". His wife, Evelyn (played by Rebecca Hall), is also a scientist and helps him with his work.

      Following one of Will's presentations, an anti-technology terrorist group called "Revolutionary Independence From Technology" (R.I.F.T.) shoots Will with a polonium-laced bullet and carries out a series of synchronized attacks on A.I. laboratories across the country. Will is given no more than a month to live. In desperation, Evelyn comes up with a plan to upload Will's consciousness into the quantum computer that the project has developed. His best friend and fellow researcher, Max Waters (Paul Bettany), questions the wisdom of this choice, reasoning that the "uploaded"

      Just from my general understanding and memory "st" is not ... to me (specifically) an abbreviation of "state" but "ste" is a U.S. Postal code (also "as I understand it") for the name of a special room or set of rooms called a "suite" and in Adamic "connotation" I sometimes read it as "sweet" ... which has several meanings that range from "cool" to "a kind of taste sensation" to "easy to sway or fool."

      If you asked me though, for instance if "it" was an abbreviation or shorthand notation or acronym for either "a United state" or "saint" ... you'd be sure.

      While it's clear from studying linguistic cryptography ... (If I studied it a little here and some there, its also from the "universal translator of Star Trek") and the personal understanding that language is a kind of intelligent code, and "any code is crackable" ... that I caution here that "meaning" and "face value" often differ widely and wildly ... even in the same place or among the same group of people ... either varying over time or heritage.

      Menelaus, in Greek mythologyking of Sparta and younger son of Atreus, king of Mycenae; the abduction of his wife, Helen, led to the Trojan War. During the war Menelaus served under his elder brother Agamemnon, the commander in chief of the Greek forces. When Phrontis, one of his crewmen, was killed, Menelaus delayed his voyage until the man had been buried, thus giving evidence of his strength of character. After the fall of Troy, Menelaus recovered Helen and brought her home. Menelaus was a prominent figure in the Iliad and the Odyssey, where he was promised a place in Elysium after his death because he was married to a daughter of Zeus. The poet Stesichorus (flourished 6th century BCE) introduced a refinement to the story that was used by Euripides in his play Helen: it was a phantom that was taken to Troy, while the real Helen went to Egypt, from where she was rescued by Menelaus after he had been wrecked on his way home from Troy and the phantom Helen had disappeared.

      This article is about the ancient Greek city. For the town of ancient Crete, see Mycenae (Crete). For the hamlet in New York, see Mycenae, New York.

      Μυκῆναι, Μυκήνη

      Lions-Gate-Mycenae.jpg

      The Lion Gate at Mycenae, the only known monumental sculpture of Bronze Age Greece

      37°43′49"N 22°45′27"ECoordinates37°43′49"N 22°45′27"E

      This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols.

      Mycenae (Ancient Greek: Μυκῆναι or Μυκήνη, Mykēnē) is an archaeological site near Mykines in Argolis, north-eastern PeloponneseGreece. It is located about 120 kilometres (75 miles) south-west of Athens; 11 kilometres (7 miles) north of Argos; and 48 kilometres (30 miles) south of Corinth. The site is 19 kilometres (12 miles) inland from the Saronic Gulf and built upon a hill rising 900 feet (274 metres) above sea level.[2]

      In the second millennium BC, Mycenae was one of the major centres of Greek civilization, a military stronghold which dominated much of southern Greece, Crete, the Cyclades and parts of southwest Anatolia. The period of Greek history from about 1600 BC to about 1100 BC is called Mycenaean in reference to Mycenae. At its peak in 1350 BC, the citadel and lower town had a population of 30,000 and an area of 32 hectares.[3]

      3. Chew 2000, p. 220; Chapman 2005, p. 94: "...Thebes at 50 hectares, Mycenae at 32 hectares..."

      Melpomene (/mɛlˈpɒmɪniː/Ancient GreekΜελπομένηromanizedMelpoménēlit. 'to sing' or 'the one that is melodious'), initially the Muse of Chorus, she then became the Muse of Tragedy, for which she is best known now.[1] Her name was derived from the Greek verb melpô or melpomai meaning "to celebrate with dance and song." She is often represented with a tragic mask and wearing the cothurnus, boots traditionally worn by tragic actors. Often, she also holds a knife or club in one hand and the tragic mask in the other.

      Melpomene is the daughter of Zeus and Mnemosyne. Her sisters include Calliope (muse of epic poetry), Clio (muse of history), Euterpe (muse of lyrical poetry), Terpsichore (muse of dancing), Erato (muse of erotic poetry), Thalia (muse of comedy), Polyhymnia (muse of hymns), and Urania (muse of astronomy). She is also the mother of several of the Sirens, the divine handmaidens of Kore (Persephone/Proserpina) who were cursed by her mother, Demeter/Ceres, when they were unable to prevent the kidnapping of Kore (Persephone/Proserpina) by Hades/Pluto.

      In Greek and Latin poetry since Horace (d. 8 BCE), it was commonly auspicious to invoke Melpomene.[2]

      See also [AREXMACHINA]

      Flagstaff (/ˈflæɡ.stæf/ FLAG-staf;[6] NavajoKinłání Dookʼoʼoosłííd Biyaagi, Navajo pronunciation: [kʰɪ̀nɬɑ́nɪ́ tòːkʼòʔòːsɬít pɪ̀jɑ̀ːkɪ̀]) is a city in, and the county seat of, Coconino County in northern Arizona, in the southwestern United States. In 2018, the city's estimated population was 73,964. Flagstaff's combined metropolitan area has an estimated population of 139,097.

      Flagstaff lies near the southwestern edge of the Colorado Plateau and within the San Francisco volcanic field, along the western side of the largest contiguous ponderosa pine forest in the continental United States. The city sits at around 7,000 feet (2,100 m) and is next to Mount Elden, just south of the San Francisco Peaks, the highest mountain range in the state of Arizona. Humphreys Peak, the highest point in Arizona at 12,633 feet (3,851 m), is about 10 miles (16 km) north of Flagstaff in Kachina Peaks Wilderness. The geology of the Flagstaff area includes exposed rock from the Mesozoic and Paleozoic eras, with Moenkopi Formation red sandstone having once been quarried in the city; many of the historic downtown buildings were constructed with it. The Rio de Flag river runs through the city.

      Originally settled by the pre-Columbian native Sinagua people, the area of Flagstaff has fertile land from volcanic ash after eruptions in the 11th century. It was first settled as the present-day city in 1876. Local businessmen lobbied for Route 66 to pass through the city, which it did, turning the local industry from lumber to tourism and developing downtown Flagstaff. In 1930, Pluto was discovered from Flagstaff. The city developed further through to the end of the 1960s, with various observatories also used to choose Moon landing sites for the Apollo missions. Through the 1970s and '80s, downtown fell into disrepair, but was revitalized with a major cultural heritage project in the 1990s.

      The city remains an important distribution hub for companies such as Nestlé Purina PetCare, and is home to the U.S. Naval Observatory Flagstaff Station, the United States Geological Survey Flagstaff Station, and Northern Arizona University. Flagstaff has a strong tourism sector, due to its proximity to Grand Canyon National ParkOak Creek Canyon, the Arizona SnowbowlMeteor Crater, and Historic Route 66.

      PSANSDISL #LWDISP either without gas or seeing cupidic arroz in "thank you" or "allta, wild" ...

      pps: a magnanimous decision ...

      I stand here on the brink of what appears to be total destruction; at least of everything I had hoped and dreamed for ... for the last decade in my life which appears literally to span thousands of years if not more in the eyes of some other beholder. I spent several months in Kentucky telling a story of a post apocalyptic and post-cataclysmic delusion; some world where I was walking around in a "fake plane" something like a holodeck built and constructed around me as I "took a walk around the world" to ... it did anything but ease my troubled mind.

      Recently a few weeks in Las Vegas, and a similar story; telling as I walked penniless down the streets filled with casino's and anachronistic taxi-cabs ... some kind of vision of the entirety of the heavens or the Earth or the "choir of angels" I think of when I echo the words Elohim and Aesir from mythology ... there with me in one small city in superposition; seeing what was a very well put together and interesting story about a "star port" Nirvane ... a place that could build cities into the face of mountains and half working monorails appearing in the sky---literally right before my eyes.

      I suppose this is the place "post cataclysm" though I still have trouble understanding what it is that's actually about ... in my mind it connects to the words "we are losing habeas" echo'ed from the streets of Los Angeles in a more clear and more military voice than usual--as I walked block by block trying to evade a series of events that would eventually somehow connect all the way to the "outskirts of Orlando, Florida" in a place called Alhambra.

      Apparently the name of a castle; though I wasn't aware of that until much later.

      It doesn't feel at all like a "cataclysm" to me; I see no great rift--only a world filled with silent liars, people who collectively believe themselves to have stolen something--something gigantic--at least that's the best interpretation of the throws and impetus behind the thing that I and mythology together call Jormungandr. With an eye for "mythological connections" you could clearly see that name of the Great Serpent of Revelation connects to something like the Unseelie; the faeries of Gaelic lore. To me though this world seems still somewhat fluid, it's my entire life--moving from Plantation to a place where the whole of it might be Bethlehem and to "clear my throat" it's not hard to see here how that land of "coughs" connects to the Biblical land of Nod and to the "Adamically sieved" Snifleheim ... from just a little twist on the ancient Norse land most probably as close to Hel as anyone ever gets--or so I dream and hope---still today. It all looks so real and so fake at the same time; planned for thousands of generations, the culmination of some grand masterpiece story that certainly ties history and myth and reality into a twisted heap of "one big nothing, one big nothing at all."

      I've tried to convey to the world how important I believe this place and this time to be--not by some choice of my own ... but through an understanding of the import of our history and the impact of having it be so obviously tuned and geared towards this specific time ... many thousands of years literally all focused on a single moment, on one day or one hour or even just a few years where all of that gets thrown down on the table as if some trump card has been played--and whether or not you fathom the same magnanimous statement or situation or position ... to me, I think it depends on whether or not you grew up in the same kind of way, believing our history to be so fixed and so difficult to change. I don't particularly feel like that's the "zeitgeist" of today; I feel like the children believe it to be some kind of game, and that it is such as easy thing to "sed" away or switch and turn into something else--another story, another purpose ... anyone's personal fantasy land come true.

      I don't think that's the case at all, it's clearly a personal nightmare; and it's clearly one we've seen time and time again--though not myself--the Jesus Christ that is the same yesterday, today; and once again perhaps echoing "no tomorrow" never remembers or believes that we've "seen it all before" or that we've ever really gotten the point; the thing you present to me as "factual reality" is a sickness, it disgusts me; and I'd do anything to go back to the world "where I was so young, and so innocent" and so filled with starry-eyed hope that we were at the foot of something grand and amazing that would become an empire turned republic of the heavens; filling the stars ... with the kind of love for kindness and fairness that I once associated very strongly with the thing I still believe to be the American Spirit.


      "Suddenly it changes, violently it changes" ... another song echoes through the ages--like the "words of the prophets dancing ((as light)) through the air" ... and I no longer even have a glimmer of hope that the thing I called the American People still exist; I feel we've been replaced by some broken container of minds, that the sky itself has become corrupt to the point that there's no hope of turning around this thing that I once believed with all my heart and all my mind was so obviously a "designed downward spiral" one that was---again--so obviously something of a joke, intended to be easy to bounce off a false bottom and springboard beyond "escape velocity" and beyond the dark waters of "nearest habitable star systems (being so very far away)" into a place where new words and new ideas would "soar" and "take flight."

      Here though; I am filled with a kind of lonely sadness ... staring at what appears to be the same mistake(s) happening over and over again; something I've come to call "skipping stones in the pond of reality" and really do liken it to this thing that appears to be the new meaning of "days" and ... a civilization that spends absolutely no love or lust to enter a once sacred and holy place and tarnish it with their sick beliefs and their disgusting desires. You all ... you appear to be some kind of springboard to "bunt" forth yet another age or era of nothingness into the space between this planet and "none worth reaching" and thank God, out of grasp. Today, I'd condemn the entirety of this world simply for it's lack of "oathkeepers" and understanding of what the once hallowed words of Hippocrates meant to ... to the people charged and dharmically required to heal rather than harm.

      It appears the place and time that was once ... at least destined to be the beginning of Heaven ... has become a "recurring stump" of some future unplanned and tarnished by many previous failed efforts and attempts to overcome this same "lack of conversation or care" for what it meant to be "humane" in a world where that was clearly set high aloft and above "humanity" in the place where they--where we were the best nature had to offer, the sanest, the kindest; the shining last best hope.


      Today I write almost every day ... secretly thanking "my God" for the disappearance of my tears and the still small but bright hope that "Tearran" will one day connect the Boston Tea Party and the idea that "render to Caesar" and Robin of Loxley ... all have something to do with a re-ordering of society and the worth and import of "money" ... to a place that cares more for freedom from murder than it does ... "freedom from having to allow others to hear me speak." I hold back tears and emotions; not by conscious choice or ability but ... still with that strange kind of lucky awkward smile; and secretly not so far below the surface it's the hope of "a swift death" that ... that really scares me more than the automatons and mechanical responses I see in the faces of many drivers as they pass me on the street--the imagery of connecting it to the serpentine monster of the movie Beetlejuice ... something I just "assume" the world understands and ... doesn't seem to fear (either); as if Churchill had gotten it all wrong and backwards--the only thing you have to fear, is the loss of fear of "loss."


      Here my crossroads---halfway between the city my son lives in and the city my parents live in--it's on making a decision on whether I should continue at all, or personally work on some kind of software project I've been writing about, or whether I should focus on writing about a "revolution" in government and society that clearly is ... "somewhat underway." In my mind it's obvious these things are all connected; that the software and the governance and the care of whether or not "Babylon" is remembered as a city of great laws and great change or a city of demons and depravity ... that these thi]ngs all hinge and congeal around a change in your hearts; hoping you will chose to be the beginning of a renaissance of "society and civilization" rather than the kings and queens of a sick virtual anarchy ... believing yourselves to have stolen "a throne of God" rather than to literally be the devastating and demoralizing depreciation of "lords and fiefdoms" to something more closely resembled by the time of the Four Horsemen depicted in Highlander.

      These words intended to be a "forward" to yet another compliment of a ((nother installment of a partial)) chain of emails; whimsically once half-joking ... I called it the Great Chain of Revelation. The software too; part of the great chain, this "idea" that the blockchain revolution will eventually create a distributed and equal governance structure, and a rekindling of monetary value focused on "free and open collaboration" rather than "survival of the most unfit"--something society and civilization seem to have turned the "call of life" from and to ... literally just in the last few years as we were so very close to ... reaching beyond the Heaven(s).

      I don't think its hard to imagine how a "new set of ground rules" could significantly change the "face of a place" -- make it something shiny and new or even on the other side of the coin, decayed or depraved. It's not hard to connect the kind of change I'm hoping for with "collision protection" and "automatic laws" to the (perhaps new, perhaps ... ancient) Norse creation story of the brothers of Odin: Vili and Ve.

      It might be hard to see today how a new "kind of spiritual interaction" might be only a few "mouse clicks" away though--how it could change everything literally in a flash of overnight sensation ... or how it might take something like a literal flash of stardom (or ... on the other hand, something like totalitarian or authoritarian "iron fisting") to make a change like this "ubiquitious" or ... something like the (imagined in my mind as ... messianic) "ED" of storming through the cosmos or the heavens and turning something that might appear to be "free and perfect feeling" today into a universe "civlized overnight" and then ...

      I wonder how long it would take to laud a change like that; for it to be something of a voluntary "reunderstanding" of a process ... to change the meaning of every word or every thought that connects to the process of "civilization" to recognize that something so great and so powerful has happened as to literally change the meaning of the word, to turn a process of civilization into something that had a ... "signta-lamcla☮" of forboding and then a magical staff struck into the heart of a sea and then ... and then the word itself literally changes to introduce a new "mid term" or "halfway point" in which a great singularity or enlightenment or change in perspective or understanding sort of acknowledges ...

      that some "clear outside" force not only intervened on the behalf of the future and the people of our world but that it was uniquely involved in the whole of--

      "waking up" tio a nu def of #Neopoliteran.

      ^Like the previous notation; the below text comes from an email previously sent; and while i stand behind things like my sanity, my words; and my continued and faithful attempt to speak and convey both a useful and helpful truth to the world---sometimes just a single day can make all the difference in the world.

      Sometimes it's just a single moment; a flash or a comment about ^th@ blink of an eye" ... and I've literally just "thought up/had/experienced/transitioned thru" that exact moment. The lies standing between "communication" and either "cooperation" or .... some other kind of action have become more defined. More obvious. Because of this clarification; like a kind of "ins^tant* gnosis"

      ... search high and lo ... the depths all the way to above the heavens ...\ \ for a festive divorce ceremonial ritual ... that looks something like a bachelor party ':;]

      --- @amrs@koyu.SPACe ... @suzq@rettiwtkcuf.social (@yitsheyzeus) May 22, 2020

      I ... TERON;

      Gjall are painting me into a corner here; and I don't see around it anymore--I don't see the light, and I don't see the point. I was a happy-go-lucky little kid in my mind; that's not "what I wanted to be" or what I wanted to present, it's who I was. I saw "Ashkenazi" and ... know I am one of those ... and I kind of understood that something horrible might have happened, or might happen here--and I kind of understand that crying smashing feeling of "to ash" that echoes through the ages in the potpourri songs about pockets full of Parker Posey .. and ancient Psalms about "from the ashes of Edom" we have come--and from that you can see the cyclical sickness of this ... place so sure it's "East of Eden" and yet gung-ho on barrelling down the same old path towards ash and towards Edom and towards ... more of Dave's "ashes to ashes dust to dust" and his "smoke clouds roll and symphony of death..." and few words of solace in a song called Recently that I imagine was fleeting and has recently come and gone--people stare, I can't ignore the sick I see.

      I can't ignore his "... and tomorrow back to being friends" and all but wonder who among us doesn't realize it's "ash" and "gone" and "no memory of today" that's the night between now and ... a "tomorrow with friends" not just for me--but for all of you--for this place that snickers and pantomimes some kind of ... anything but "I'm not done yet" and "there's more ... vendetta ... and retribution to be had, Adam ... please come back in a few more of our faux-days." This is sickness; and happy-go-lucky Himodaveroshalayim really doesn't do much but complain about that word, the "sickle" and the tragic unavoidable ... ash of it all ... these days--you'd think we could "pull out" of this mess, turn another way; smile another day, but it seems there's only one way to get to that avenu in the mind of ... "he who must not know or be me."


      I have to admit I found some joy in the epiphany that the hidden city of Zion and it's fusion with the Namayim' version of how that "Ha" gels and jives with the name Abraham and the Manna from Heaven and the bath salt and the tina and the "am in e" of amphetamine--maybe a glimmer or a shimmer or a glow of hope at the moment "Nazion" clicked ... and I said ... "no, not me ... I'm nothing like a king, no dreams of authoritarianism at all in the heart of Kish@r;" even as I wrote words that in the spirit of the moment were something of a "tis of a'we" that connected to my country and the first sing-songy "tisME" that I linked to trying to talk in the rhyming spirit of some "first Christ" that probably just like me was one limmerick away from the end of the rainbow and one "Four Non Blondes" song away from tying "or whatever that means" and this land crowned with "brotherhood" (to some personal "of the Bell, and of the bell towers so tall and Crestian") to just one Hopp skip and jump away from the heart of the obvious echoes of a bridge between haiku and Heroku... a few more gears shift into place, a click and and a mechanical turn of the face of the clock's ku-ku striking ... it was the word "Earthene" that was the last "Jesusism" around the post Cimmerian time linking Dionysus and Seuss to that same "su-s" that's belonging to a moment in the city of Uranus--codified and etched in stone as "MCO"--not just for its saucer and warp nacelles and "deflector dish" but for it's underground caverns and it's above ground "Space Mountain" and that great golf ball in the heart of it all.

      The gears of time and the dawns of civilizequey.org query the missing "here" in our true understanding of what "in the beginning, to hear; to here ... to rue the loss of the Maize from Monoceros to the VEGA system and the tri-galactic origin of ... "some imaginary universal ... Earthene pax" to have dropped the ball and lost it all somewhere between "Avenu Malkaynu" and melaleuca trees--or Yggrasil and Snifleheim--or simply to miss the point and "rue brickell" because of bricks rather than having any kind of love or nostalgia linking to a once cobblestone roadway to the city in the Emerald skies paved in golden "do not return" signs ... to have lost Avenues well after not realizing it was "Heaven'es that were long gone far before I stepped foot on this road once called too Holy for sandals" in a place where that Promised Land and this place of "K'nanites" just loses it's grip on reality when it comes to mentioning the possibility that the original source and story of Ca'anan was literally designed to rid the world of ... "bad nanites" and the mentality of ... vindictiveness that I see behind every smirk.

      The final hundred nanoseconds on our clock towards doom and gloom cause another bird to fly; another snake to curl up and listen again to the songs designed to charm it into oblivion; whether that's about a club in South Beach or a place not so far from our new "here..." all remains to be seen in my innocent eyes wondering what it truly is that stands between what you are ... and finding "forgiveness not needed--innocent child writes to the mass" ... and the long arm of the minute hand and the short finger of the hour for one brief moment reconcile and move towards "midnight" together; and it's simply idyllic, the Nazarene corner between nil and null you've relegated the history of Terran poast futures into ... "foreves mas" or so they (or you) think.


      I'm still so far from "Five Finger Death Punch" though; and so far from Rammstein and so far from any kind of sick events that could stand between me and "the eternal" and change my still "casual alternative rock" loving heart to something more death metal; I rue whatever lies between me and there being any kind of Heaven that thinks there could exist a "righteous side" of Hell and it... simultaneously.


      I still see light here in admonishing the masses and the angels standing against the story and the message God brings us in our history. I still see sparks in siding with the "causticness" of "no holodecks in sight" and the hunger and the pain of simulating ... "the hells of reality" over the story of decades or centuries of silence refusing to see "holography" and "simulated" in the word Holocaust and the horrors of this place that simply doesn't seem to fathom or understand the moments of hunger pangs and the fear of "dark Earth pits" or towers of "it's not Nintendo-DS" linking the Man in the High Castle to an Iron Mask.

      I rally against being what I clearly am raised high on some pedestal by some force beyond my comprehension and probably beyond that of the "perfect storm in time" that refuses to itself acknowledge what it means to gaze at such an unfathomable loss of innocence at the cost of a "happy and serene future" or even at the glimmer of the Never-Never-Land I'd hoped we would all cherish and love and share ... the games and the newfound freedom that comes not just from "seeing Holodeck" turn into "no bullets" and "no cages" but into a world that grows and flourishes into something that's so far beyond my capability to understand that I'm stuck here; dumbfounded; staring at you refusing to stop car accidents and school shootings ... because "pedestal." For the "fire and the glory" of some night you refuse to see is this one--this place where morality rekindles from ... from what appears tobe one small candle, but truly--if it's not in your heart, and it's not coming from some great force of goodness--fear today and a world of "forever what else may come."


      Here in a place the Bible calls Penuel at the crossing of a River Jordan ... the Angel of the Lord notes the parallels in time and space between the Potomac and the Rhine--stories of superposition and cities and nation-states that are nothing more than a history of a history of things like the Monoceros "arroz" linking not just to the constellation Orion but to Sagittarius and to Cupid and of course to the Hunter you know so well--

      Searching for a Saturday; a sabbath to be made Holy once more ... "at the Rubycon"

      The Einstein-Rosen Wormhole and the Marshall-Bush-JFKjr Tunnel

      The waters are called narah, (for) the waters are, indeed, the offspring of Nara; as they were his first residence (ayana), he thence is named Narayana.

      --- Chapter 1, Verse 10[3]

      In a semi-fit of shameless arexua-self recognition i'm going to mention Amazon's new series "Upload" and connect it to the PKD work that my Martian-in-simulcrum-ciricculum-vitae on "colonization education" ... tying together Transcendance, Total Recall and ... well; to be honest it actually gave me another "uptick" in the upbeat ... maybe i'll stick around until I'm sure there's at least one more copy of me in the ivrtual-invverse ... oh, that reminds me ... Farmer)'s Lord of Opium also touches on this same "mind of God in the computer" subject (which of course leads to Ghost in the Shell and Lucy--thanks Scarlette :).

      While I'm listing Matrix-intersected pieces of the puzzle to No Jack City, Elon Musk's neuralace and Anderson's Feed are also worth a mention. Also the first link in this paragraph is titled ... "the city of the name of time never spoken after time woke up and stfu'd" (which of course is the primary subject of this ... update to the city Aerosol).

      The ... "actual original typed dream" included a sort of "roller coaster ride" through space all the way to Mars; where the real purpose of "the thing" I am calling the "Mars Hall" was to display previous victories and failures ... and the introduction of "older or future" culture's suggestions for "the right way" to colonize a new habitat. If it were Epcot Center, this would be something like SpaceMountain taking you to to the foture of "Epcot Countries" as if moving from "countries" to planets were as easy as simply ... "reading backwards."

      THE SOFTWARE, SINGERS, AND SHIELD(S)

      OF

      HEIROSOLYMITHONEYY

      Thinking just a little bit ahead of myself, but I'm on "Unreal Object/Map Editor within the VR Server" and calling it something like "faux-wet-ware" ... which then of course leads to a similar onomonopeia of "weapons and ..." where-with-all to find a better singer's name to connect the road of "sword" to a Wo'riordan ... but I think that fusion of warrior and woman probably does actually say ... enough of it all; on this road to the living Bright Water that the diety in my son's middle name defines well here, as "waking up," stretching it's tributaries and it's winding wonders and wistfully ....

      Narayana (Sanskrit: नारायण, IASTNārāyaṇa) is known as one who is in yogic slumber on the celestial waters, referring to Lord Maha Vishnu. He is also known as the "Purusha" and is considered the Supreme being in Vaishnavism.

      andromedic; the ports of call ... to the mediterranean (literally) from the gulf coast;

      ... ho engages in the creation of 14 worlds within the universe as Brahma when he deliberately accepts rajas guna, himself sustains, maintains and preserves the universe as Vishnu by accepting sattva guna. Narayana himself annihilates the universe at the end of maha-kalp ...

      .

      there's no place like home. there's no place like home. there's no place like home.

      and so it begins ... "f:

      r e l i g i o n

      find out what it means to me. faucet, ever single one, stream of purity ...

      from Fort Myers ... f ... flicks ... Flint.

      "

      ^this notation will from this email forward in linear time denote some form of contact method or information related to the context of the message you are reading. This particular one sends me an encrypted email. 5if there is an "@" symbol involved in the "anchor's hypertext reference" (technically an "a href=" in HTML4) your browser should attempt to open an email client to send a message over an anonymous SMTP relay. Understand that "anonymous" in this case may or may not mean your sending email address is hidden or obvuscated--so if you want to receive a reply you must include it in the DATA of your SMTP transmission defined by the RFC5321 attached. In most cases "anonymous" also means that you will not have the recipients direct contact information unless they have made it public---additionally the exact server/system/relay used may or may not be the "Sbroken Berkman Perl Script" linked to in the "hypertext reference" specifically anchored to the words "an anonymous SMTP relay" above.

      A simple "hat character" (^) and the letter "t" as you see beginning the above paragraph will denote a contact method or form that works over the internet using an HTTP protocol defined in a series of RFC's including (but not limited to) RFC's numbered as 2616, 7230, 7235, 2068 and use a simple language which is based on a definition suggested or proposed currently by an organization called the "W3C Consortium"

      ---and ... previously set and defined by an organiza^tion located at html.spec.whatwg.org; which appears (to me, for the first time as I write these words) to follow the conceptual spirit of the "living document" defined by the several "Continental Congresses, et alia." I personally now conjoin this document in my head to a procession of patrilineal or matrilnear predecessors to the actual event .... still to be defined ... but related to this specific email, this mailing list; its contributors and readers as well as actual members of the organization (still to be created, defined, or named) that creates a "round table" of members that is open to the public, to all voters educated enough to understand the specific issue being voted on (up to a standard that; in this place and time appears to be unset and unmet but materially related to reawching the age of 18 years old; growing up in or being born in the United States of America (related spec. to the Constitution of the United States of America which is officially "self-defined" through a process which includes all three branches of the government which it also "self-defines" and purports to be "of, for, and by the people"--though the general population is only able to contribute through an indirect process (read:the people cannot directly contribute to the constitution without either running for office (like a senator) or being appointed to a specific government position (like a judge or executive branch public servant).

      The current state of American representative democracy is the highest standard to which I am currently knowledgable of "extant"--and it is specifically substandard, inferior, and "just not good enough" as a comparison to the process required to vote in the organization being "self-defined" through this process*. It is my sincere and clear hope that "this process" will result in a legal and moral amendment to the document shown in the previous link and presented by the Legislative Branch of the United States here. It is my current and faithful belief that anything else would also be significantly below the standards morally required by "this process" which of course includes over 200 years of American citizenship and (other international relations; i.e.e.gfor "iv" exampleid estexemplia gratia) as well as the Sons of Liberty and prior to that contributions from the Crown and the "Parliament and Crown" of the United Kingdom; among others et alea's ifndef: 'swikipedia/et_al..

      To note specifically because of lack of personal knowledge and public notoriety (assuming all other requiremnant* achem requirements)

      alas, babylon.

      i listened to a man yesterday who was talking about "true heroes" ... he of course noted jesus christ and superman together, suggesting the first was one, and the second just a fiction. he also talked about people like ghandi and "leaders who use non-violent means to "change the world." i at least agree with him on the third, ghandi is a good prototype for some kind of hero. staring at this ... "to be completed" work on tales of two cities, whether from sodom and gomorrah all the way to athens and sparta and perhaps even london and paris--and this particular city, babylon; it stands out as one which truly has no equal or even "mirror" in the history of the world. i suppose i'd add "alexandria" and suggest the library and the laws; something that are fundamental to the ethos of the planet i call "athens."

      i imagine he did not know "hammurabi's" name; and even today in this place where i ask and do not receive answers; i imagine you still don't connect muhammad or amsterdam ... to this king who in our history is set apart and lifted high on a pedestal of having "codified and written down" laws ... for the very first time. it's almost comical, it took me a paragraph and a sentence to connect "the king and i" to this mirror world, where the bible and the people have most assuredly decided "babylon" is a negative thing or a depraved place.

      "fallen, fallen, is [the city of] babylon the great"

      ... just a quote from one of my favorite movies; which of course is re-quoting "dante" and/or "the bible"

      "a dwelling place [of] (the) demons (say), it has become."

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the Review Commons editor and three reviewers for their enthusiastic response, including their constructive suggestions and appreciation of the high impact and originality of our study. We have completed the revisions and new analyses suggested by the reviewers, and we thank the reviewers for their suggestions to increase the impact and interest in this work and for guiding us towards this much improved manuscript.

      In this response letter, we present the response to each reviewer comment and associated revisions made to the text and figures as bullet points below the reviewers' text (black text).

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Yang et al. took advantage of recently published long-read-based genomic sequences of nearly homozygous genomes from complete hydatidiform moles to retrieve allelic sequences of LINE-1, the currently only active and autonomous retrotransposon of the human genome, and produced the repertoire of intact LINE-1 in a genome. The authors performed cell-culture-based retrotransposition assays measurements and in vivo fitness estimations of all identified intact LINE-1 to infer evolutionary dynamics. In this article, the authors further validate the major contribution of polymorphic LINE-1 to the de novo retrotransposition events in the human genome. They also described, at unprecedented resolution, allelic variations among LINE-1 loci and the potential impact of these variations to the interpretation of mutagenic potential of each LINE-1 locus.

      Major comments:

      1 - The key conclusions of the article are mostly convincing. However, it would be a substantial improvement to consolidate the data of the article with information about known active LINE-1s in germ cells or in cancer by using data from recent publications of the Devine and Tubio labs (for example PMID: 34772701, 32024998, 25082706). Across the article, no mention is made of the transductions generated during LINE-1 de novo retrotransposition, which is instrumental to monitor in vivo activity of a group of LINE-1 active copies. It would be of particular interest to make a link between in vitro activity from this study with LINE-1 classification based on their observed activity in cancer (PMID: 32024998, Figure 3b).

      • We thank this and the other reviewers for this suggestion. We agree that a more explicit comparison to the often-reported counts of 3’ transductions would be a valuable addition to our analyses. We have added the 3’ transduction counts from PMID:34772701, PMID:32024998 and PMID:25082706 to Table S2 (column Y, Z and AA), and made a comparison between these data and our Hamming-distance-based in vivo activity, as the new Figure S5. We found correlations between the two measurements in a significant proportion of LINE-1s, but some interesting exceptions exist which likely reflects the fact that most catalogued 3’ transductions come from cancer genomes, and cancer and germline cells represent distinct cellular environments in which distinct sets of LINE-1s are able to replicate (and leave 3’ transductions). In addition to the new figure (Figure S5), we have added a discussion paragraph focused on this interesting comparison.

      2 - The use of CHM1 BAC library Sanger sequencing validation and comparison with CHM13 and hg38 sequences is instrumental to support the building of LINE-1 repertoire in CHM1 genome, which is a valuable contribution of the article. The use of a distance-based metric to infer fitness of a LINE-1 is an interesting approach and allow to group LINE-1 copies based on their in vivo activity potential. Again, it would be beneficial to correlate the inferred fitness and retrotransposition activity of copies/alleles, when known, from the above-mentioned literature.

      • The sequence validation of LINE-1 sequences in CHM1 is an important point which we have addressed in the edited manuscript. Specifically, we used three forms of sequence validation including end-sequencing of one clone of each LINE-1 after it was cloned into the retrotransposition vector and whole-plasmid sequencing of select LINE-1s with discrepant activity amongst the three clones we assayed. In addition, we sequenced the entire LINE-1 sequence for four LINE-1s which had the largest number of mutations relative to their allelic counterpart in CHM13. Please see the above response to ‘Major comment 1’ for details of our new analysis comparing the previous literature to our data.

      3 - Some aspects of the writing of the article should be improved to better support the conclusions.

      • We thank the reviewer for providing these examples of parts of the text that were particularly difficult to read and comprehend. We have deeply streamlined and improved the text throughout the manuscript based upon detailed editing for readability and clarity by two experienced scientific writers. Below, we detail how we revised the particular sections presented by the reviewer, but we think the entire manuscript is now more succinct and clearer.

      • In general, the descriptions are dense, and details could be provided in a more direct way to lighten the results section. Several redundancies in the discussion can be combined to increase clarity.

      • We have spent considerable time tightening up the text, including removing several overlapping sections from the discussion which can be seen in the included version with changes tracked.

      • There is a lack of clarity in the description of how was handled each pair of alleles for which retrotransposition measurements vary between the study and the literature (last paragraph of the "Comprehensive measurement of LINE-1 in vitro activity in a human genome" section). It is not completely clear how the analysis was done and the way the data is presented in File S3 is not helping to support the conclusion. It could be useful to include some illustrative examples in a panel of Figure 2.

      • We agree that this description was hard to parse, and we have rewritten this and accompanying methods to simplify our explanation of these results. In addition, we have revised Figure 2 to show the data in much more detail. To further aid the logic flow related to this section, we moved the previous Figure 5B to Figure 2B, updated it with more suitable examples and edited the associated descriptions.

      • Regarding inferred in vivo activity, the text contains alternative description with the use of "fit" / "unfit", in vivo "active" / "inactive" or "no closely related LINE-1s" terms. The authors should find a way to clearly define and systematically use one set of terms to enhance clarity along the article. To parallel with in vitro active/inactive, it would be useful to use in vivo fit/unfit.

      • We thank the reviewer for this suggestion and agree with their suggested unified use of ‘in vivo fit/unfit’. To clarify and simplify these terms as much as possible, we added detailed explanations of in vivo / in vitro activity and systematically defined in vitro "active/inactive" (page 5, right column, line 50) and in vivo "fit/unfit" (page 8, left column, line 26) at their first appearance in the article, and we changed most instances of "in vivo activity" to "in vivo fitness" when context permits.

      4 - The authors suggest that in vitro activity can be predicted by integration of population frequency and in vivo activity (/fitness) (second paragraph of the "An analysis of LINE-1 evolutionary history [...] and in vivo activity" section). It would be beneficial to strengthen the writing of this section and ultimately validate/test the model by including data from some of the previous studies (e.g. Brouha 2003, Lutz 2003, Seleme 2006, Beck 2010, Rodriguez-Martin 2020, Chuang 2021).

      • We have thoroughly revised this section of the results (see response to ‘Major comment 3’ above), per the reviewers suggestion, to increase reader comprehension of this important data. In addition, we greatly appreciate the reviewer’s suggestion of a very interesting experimental direction – moving beyond a single long-read-based genome to many diverse genomes, and ultimately calculating the in vivo fitness of the LINE-1s from these diverse genomes. For a long time this has not been possible, but the recent publication of the Human Pangenome presents an opportunity to study this interesting question. Though beyond the scope of this paper, our lab is actively working on this fascinating question, and we appreciate the reviewer’s shared interest in this question.

      5 - The identification of adaptive mutations is only partially described and not strongly supported by experimental or analytical data. It would be interesting to explore the role of phylogenetically informative sites described in Figure 5B/C by testing non CHM1 alleles in retrotransposition assay (by introducing amino acid changes into the cloned CHM1 LINE-1 alleles) or by positioning the sites in ORF1p or ORF2p structure and/or domains to infer impact on functionality.

      • The reviewer rightly points out that this is one of the most interesting and novel findings of our manuscript. However, the testing of potentially adaptive mutations is potentially complicated and nuanced. Specifically, we don’t know the mechanism by which these mutations might be adaptive. It is possible that they simply increase in vivo germline retrotransposition activity and this increase would be reflected by an increase of in vitro retrotransposition activity. However, another possibility is that these adaptive phenotypes only show themselves in vivo or in the context of the host restriction factors expressed in the germline. We strongly agree with the reviewer that experimental and analytical data on the phylogenetic informative sites associated with the Figure 5 phylogeny is the key to finding out the mechanisms for these changes to affect LINE-1 activity/fitness, and we are, indeed, exploring this very question in the lab now with related projects. We respectfully suggest that these (extremely cool) experiments are beyond scope of this paper, but we have also added some more detailed description and analyses of the potentially adaptive LINE-1 variations from Figure 5 (from page 9, right column, line 50 to page 10, left column, line 5).

      Minor comments:

      1 - Regarding the in vitro retrotransposition assay, it would be beneficial to provide more data. The current Figure 2 could be enriched by the addition of data related to the variation in the replicates of the experiment (technical but mostly biological with the three clones per LINE-1 tested). Figure 2 could include a dashed line for 100% L1RP and 5% (since it is used as a threshold). It would be useful to provide an additional panel in Figure 2 to illustrate alleles of LINE-1 that are active in this study and compare the values obtained previously in other studies. Similarly, a supplemental table or alignment could be provided to document amino acid changes in the two alleles of each pair (see comment above in the Major Comment 5). The L1Hs subfamilies could also be included in the graph of Figure 2 to support the conclusions of remaining active old L1Hs at allelic forms in the human genome.

      • Upon consideration of this helpful comment, we now augment the presentation of our in vitro activity data with a remade Figure 2 with boxplots to show the variation of the data, as well as a horizontal dashed line showing the active-cutoffs and star signs showing which LINE-1s belong to L1Hs or L1PA2.

      2 - Also, the validation of cloning is not well described. The choice of PCR validation must be supported by more technical details on the design of the primers used to validate each copy. The authors should clearly state that the strategy chosen for retrotransposition assay does not rely on the transcription from LINE-1 5UTR but from an upstream strong promoter, ruling out the role of potential mutations in LINE-1 promoter.

      • As detailed above in the response to ‘Major Comment 1’, we used a combination of end sequencing, whole plasmid sequencing, and multi-read Sanger sequencing to validate the sequences of each LINE-1 cloned from a CHM1 clone. When cloning each LINE-1, we used a specific set of primers designed for the ends of the UTRs for each LINE-1. We have updated the methods and text to clarify this cloning step, and the sequences of these oligos are included in Table S2.
      • To clarify the fact that our retrotransposition assays use a common, strong promoter, we added text in several places stating this setup and discussing (paragraph that starts at page 11, right column, line 18) how 5'UTRs and other non-ORF factors can affect the rate of LINE-1 in vitro activity.

      3 - There are discrepancies with the reported numbers of LINE-1s between Figure 1A and Table S1: 154 vs. 151 in CHM1, 144 vs. 143 in CHM13, respectively.

      • We thank the reviewer for spotting this error on our part. The numbers in Figure 1 and the main text were correct, and we have revised Table S1 to reflect this data.

      4 - The choice of colors in Figure 3 is not perfectly clear and sometimes not as reported in the text (green highlight and orange highlight). Part of the Figure 3 legend is missing. It should include a description of the color code chosen for the right histogram.

      • We thank the reviewer for bringing this inconsistency to our attention. Based upon feedback from all reviewers, we have simplified the color scheme in Figure 3 and Figure 5 to focus on the core conclusions of these two figures. Specifically, in Figure 3, we have removed the quadrant shading and more clearly presented the cutoffs of ‘polymorphic/high frequency’ and ‘in vitro active/inactive’ as dashed lines in the scatter plot. In Figure 5, we have simplified to two colors – black for in vivo unfit and orange to show the in vivo fit LINE-1s which is also used in Figure 4 to show the definition of in vivo activity. These updated colors are now defined in the figure legends and main text, and we have made references to these colors consistent throughout.

      5 - For Figure 4, it would be useful to define in the legends the color code for the top histogram. To better read the scatter plot, the words "fit" and "unfit" could be added on each side of the vertical dashed line.

      • We thank the reviewer again for suggestions to improve the clarity of our figures. As mentioned above in ‘Minor comment 1’, we have removed unnecessary colors including the gradient of the histograms in Figure 3 and Figure 4, since the boundaries of each bin are already defined by the axis labels and tics. As suggested, we have also added ‘fit’ and ‘unfit’ labels to the dashed cutoff line in Figure 4 to clarify the meaning of this line.

      6 - In panel B of Figure 5, it seems that the color code and hot/cold description is not fully formatted.

      • This formatting error has been corrected.

      Reviewer #1 (Significance (Required)):

      In this article, Yang and colleagues present an unprecedented view of the allelic diversity of young LINE-1 copies related to variable retrotransposition activity in an individual genome. One key aspect of their work is the description of the presence of young active LINE-1 alleles that are absent or non-intact in other genome assemblies, while described at a lower scale in initial work from the Kazazian and Moran labs, cited in the manuscript. The work of Yang et al. demonstrates the requirement of multiple approaches and long-read-based sequencing of individual genomes to fully infer the mutagenesis risk of LINE-1 activity.

      The data and methods provided by the authors open the door to a more systematic analysis of mutations and rare allelic forms to understand both mechanistic aspects and evolution of LINE-1 retrotransposition in the human genome. The identification of rare allelic forms of old LINE-1 that retain activity despite previously being considered as inactive is particularly interesting in the light of LINE-1 evolution in the human genome. The authors also describe allelic diversity inside of the Ta1d subfamily, suggesting further diversification and emergence of LINE-1 subgroups. Together with the identification of nucleotide polymorphism among LINE-1 copies, these findings strengthen the notion of individual genomes with individual set of potentially mutagenic LINE-1 alleles.

      The findings and methods described in this article are of great interest to a wide audience including the fields of research focusing on human genome evolution, transposable elements, genomic instability, human genetic variation, and personalized medical diagnostic.

      Aurélien J. Doucet CNRS - Université Côte d'Azur

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This manuscript is an interesting and well-crafted study of LINE-1 activity at the single genome human genome level using long read-based haploid assemblies. The manuscript has some real gems and address critical aspects of LINE- biology that are typically not rigorously examined. The authors are to be commended for undertaking this exercise and for providing interesting perspectives that challenge the dogma that dominates the field in several areas. Despite the noted strengths of the contributions, the manuscript ignores the clear limitations inherent to the approaches taken and at times appears as dogmatic as the dogma that they themselves are trying to challenge. These deficiencies should be addressed before this manuscript is published.

      • We thank Reviewer 2 for their enthusiastic appreciation of the value and innovation of our manuscript. We also thank the reviewer for encouraging us to make careful consideration of the missing references relevant to our findings. We have had two researchers with experience in relevant fields edit our text for both readability, clarity, and proper inclusion of relevant references. We have added these throughout and taken careful effort to replace ‘dogmatic’ statements with clear presentations of the data and thorough referencing of the relevant literature.

      Several major and minor points to consider during revision include:

      Major:

      1. Several strategies have been published in the past that have confidently assign LINE-1s to specific loci despite use of shorter reads. These works should be acknowledged, even if as stated in the manuscript, use of longer reads will only continue to add confidence and validity to future assignments.

      2. We thank the reviewer for this suggestion, and we apologize for the omission of these important publications. As noted above, we have added numerous relevant references (reference 17-27 in the revised text) throughout the text including previous work that used short reads to confidently assign polymorphic/non-reference LINE-1s to specific loci. For example, we now cite the MELT pipeline to detect de novo L1 insertions with short reads (PMID: 28855259), and Iskow et al. 2010, which detects LINE-1s with junction fragment sequencing (PMID: 20603005). We have also added additional text to clarify that short reads are, indeed, often sufficient to place new LINE-1 insertions, while long reads are especially useful for resolving the sequence and location of these insertions. The new text (page 2, left column, line 22-30) presents the advantages/disadvantages of both short reads and long reads.

      3. One of the important requirements for precise quantification of LINE-1 activity and predicted risk scores cited in the manuscript was the need to predict activity based on sequence and location. This requirement, as posited in the manuscript, ignores the critical role of epigenetic control in the regulation of LINE-1 activity. As such, a discussion that acknowledges the critical roles of histone and DNA covalent modifications, and that integrates epigenomic insight into predictions of LINE-1 activity must be included in the manuscript.

      4. We thank the reviewer for suggesting this important discussion point. In response, we have expanded our discussion of this topic to place our data in the context of other literature on the effects of epigenomic regulation on in vivo LINE-1 activity, including histone and DNA modifications, as well as the effects of post transcriptional restriction factors (paragraph starting at page 11, right column, line 42).

      5. The limitations associated with the use of the CHMI were not addressed in the manuscript. While CHMI contain a paternal only genome, with no maternal contribution, the moles may arise from fertilization of an anuclear empty ovum by a haploid 23,X sperm or fertilization by two sperm giving rise to 46,XX or 46,XY karyotype. As such, generalizable conclusions about CHMI genetics should be carefully made given that the loss of maternal epigenetic imprinting and gain of paternally imprinted expression may result in abnormal gene expression, including that of LINE-1s. These variances will in turn impact LINE-1 activity profiles.

      6. We thank the reviewer for pointing out this confusingly written section of our manuscript, and we agree with the reviewer that LINE-1 activity measurements could be complicated in the CHM cell lines; however, all of our retrotransposition assays were carried out in the common background of 293T cells (chosen because of their low expression of know LINE-1 restriction factors (PMID: 25182477). We have modified the text (page 11, right column, line 52) to clarify these points.

      Minor

      1. Important citations of previously published work are not properly referenced throughout the manuscript. These are too numerous to identify individually, but the authors should carefully read the manuscript to ensure that proper documentation and reference to previous work is duly acknowledged.

      2. Please see our above response to ‘Major point 1’.

      There are several typos and missing prepositions that should be corrected. For instance, on page 7, the word "great" should be "greater".

      • Please see our above response to ‘Major point 1’ and Reviewer 1’s ‘Major comment 3’ for details on our in depth editing of the manuscript.

      Reviewer #2 (Significance (Required)):

      The contribution is highly significant as it challenges previously held concepts and advances our understanding of critical structure and function relationships of Line-1s.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Yang et al. perform an in-depth analysis of potentially mobile source L1 alleles in a single human genome (CHM1) previously subjected to Pacbio whole genome sequencing. The retrotransposition efficiencies of source L1 alleles with intact ORFs were tested in vitro, and these efficiencies compared to a model of in vivo activity based on Hamming distance to other ORF-intact L1 alleles. Comparisons of CHM1 L1 alleles are made to CHM13 (used for the recent T2T reference assembly), and also to population-scale sequencing efforts to establish how widespread each source L1 allele is. These data showcase the advantages of being able to resolve L1 alleles with long-read sequencing, allowing the field to make much more accurate predictions of retrotransposition potential in a given genome. The core analyses appear robust and for the most part enough detail is provided to follow what was done.

      • We thank Reviewer 3 for their in depth reading and analysis of our manuscript and data, and for their enthusiasm about the importance of this work in the context of foundational research from their lab and many others in the field. We have carefully considered each comment and completed several new analyses of our data and related data from other publications. We feel that our manuscript is much improved with this new data, as detailed below. Comments:

      1) The text overlooks the potential importance of L1 5'UTR mutations in L1 activity and evolution, as per PMID:25274305, PMID:1701022, and other studies, as well as the impact of genomic context on source L1 activity, as per PMID:27016617, PMID: 33186547 etc. L1 promoter evolution is arguably a major driver of L1 lineage emergence.

      • We thank the reviewer for suggesting these important additions. To present the relevance of 5'UTR mutations on LINE-1 activity and evolution, we added a discussion paragraph (paragraph starting at page 11, right column, line 16) to address how 5'UTRs and other non-ORF factors can affect the rate of LINE-1 in vitro activity. Several key references have been added and discussed in the paragraph: PMID:25274305 reported the regulation of human LINE-1 by the evolution of its 5'UTR; PMID:1701022 was one of the earliest papers that found the effect the 5'UTR promoters on human LINE-1 retrotransposition; PMID: 27016617 and PMID: 33186547 reported specific L1 loci regulated by different promoters and was included in the discussion; PMID:9430649 was one of the examples of non-human LINE-1 lineages emerging because of different promoters and was cited in the added discussion paragraph. We have also added discussion points to make clear that genomic content has a clear role in the activity of source LINE-1s (paragraph starting at page 11, right column, line 42).

      2) The way the retrotransposition assay is done here (I think) removes parts of the UTRs as part of introducing L1s into retrotransposition vectors, meaning that the assay tests the biochemical activity of the ORFs. It would be helpful to readers to have a more detailed method for this assay, including the origins of the reporter plasmids, whether there is a CMVp boosting the L1 promoter etc, and some clarity about how much of each L1 was cloned into the assay.

      • We have added relevant details to the results (page 6, left column, line 5), discussion (page 11, right column, line 52), and methods (page 13, right column, line 16 and 30) sections to clarify the reviewer’s important points. The LINE-1s tested for in vitro activity were cloned in their entirety (UTRs and ORFs) but driven by both their native promoters in the 5'UTR as well as an upstream CMV promoter. Also, please see our response to Reviewer 1 ‘Minor comment 2’ above.

      3) Pacbio long-read sequencing has been used previously to locate and characterise L1 alleles in human DNA. The Introduction states: "These represent the first scalable methods to catalog LINE-1 locations and sequences in individual human genomes". The "first" here is questionable. Citations to PMID:31853540 and PMID:34772701 should be included. The latter is particularly relevant at it not only resolves source L1 sequences with PacBio sequencing but also summarises their retrotransposition efficiencies in vitro and population frequencies.

      • We apologize for leaving out these and other important references, and we agree that the “first” claim is unnecessary. We have added the references suggested for the reviewer as well as several other important references as detailed in the above response to Reviewer 2 ‘Major point 1’. In addition, we have revised the adjacent text and deleted any references to our work as the “first” in these approaches.

      4) I am very interested in the two source L1s (on chr7 and chr9) that were found here to be more active in vitro than L1RP (to my knowledge the most active such element isolated to date, or close to it). Is there anything unusual about these two L1s? A quick look at the supplemental suggested the chr9 element was 5' truncated, was it tested as such in vitro? Also I think it would be worth contrasting the assay (all in HEKs) used here to test efficiency with the assay used by Brouha ... I feel readers may be surprised to find two L1s more mobile than L1RP in one genome.

      • To provide more details about the two active L1s (chr7 and chr9), we investigated key changes that could be related to the in vitro activity of these elements and now show them in Figure 2B and File S3. In the process of this updated analysis and suggested modifications to Figure 2 by this reviewer and Reviewer 1, we saw that the chr7 L1, mentioned here, had one very high activity measurement pulling its activity above L1RP. As such, we decided to more rigorously normalize our data by using the positive and negative controls across all plates of each day instead of normalizing to the controls of individual plates, as we had previously done. In addition, for any L1 with discrepant activity among the three clones we assayed, we used whole plasmid sequencing to confirm the identity and consistency of all three clones. In three cases, we found that one or two of the three clones was the wrong L1, and hence excluded them for the in vitro activity calculation. After this validation and testing of additional clones, all clones from the same L1 have consistent in vitro activity (see updated Figure 2). The updated in vitro activity of the chr7 L1 is at 86.7% L1RP, and the chr9 L1 is at 261.4% L1RP in addition to the chr17 LINE-1 with 117% L1RP and two additional LINE-1s that have near-L1RP activity levels (Table S2, column S). These changes in L1 activity were updated in the text, figures, and supplemental materials. Also, we note that the chr9 element is 6019bp in length and was tested as such in vitro. Current work in the lab is attempting to understand the mechanisms of increased LINE-1 in vitro and in vivo activity, as described in detail in response to Reviewer 1’s ‘Major comment 5’.

      5) In several places it is mentioned how L1 alleles may differ from sequences provided in reference assemblies, and may therefore explain discrepancies between assay results here and in other studies (e.g. Brouha). The Seleme and Lutz papers are correctly mentioned here, but arguably the most complete demonstration of this concept, from PMID:31230816, is overlooked. This study reports a chr13 source L1 that was previously found to be inactive by Brouha, and with broken ORFs in the reference genome, has both mobile and immobile alleles in the human population. This L1 is actually in CHM13, but not CHM1, and is "hot" in some individuals and not others. There are several places in the manuscript where this earlier study is very relevant and it would be fair to ask it to be mentioned, especially as the results are concordant. The same concept is reinforced by an even more recent paper (PMID:35728967), except in macaque, showing that this is a general consideration for primate L1 lineages, and actually that source L1 is relatively old and yet jumps extremely well in vitro, which fits an observation made in the present study. Mutually supporting observations like these really add confidence that what is reported in the present study is robust.

      • We thank the reviewer for their suggestion to include these highly relevant and important papers; we apologize for this initial omission. We have now added several sentences to the introduction and discussion (top left paragraph page 11) in addition to citations of these papers.

      6) Hamming distance between ORF-intact source L1 alleles is used to assess in vivo activity. This seems reasonable. However, in other works, transductions have been used to identify families of very closely related L1s. I realise that many highly mobile source L1s will rarely generate insertions carrying transductions, and yet I wonder if any of the youngest L1s in the present study form transduction families, and whether estimates of in vivo activity based on transductions found in population-scale data would reconcile better with in vitro retrotransposition assay data.

      • We thank the reviewer for pointing out our exclusion of data on 3' transductions, the most commonly used surrogates of in vivo activity, while also acknowledging that only a small percent of new L1 retrotranspositions carry 3' transduction. Please see our above response to Reviewer 1’s ‘Major comment 1’ for details on our newly added comparison of our in vivo activity data to the 3' transduction-based somatic LINE-1 retrotransposition landscape of those reported in PMID:34772701, PMID:32024998 and PMID:25082706.

      7) In the Introduction, it is stated that L1 only transmits vertically. It may be prudent to mildly qualify this position, based on PMID:29983116.

      • The referenced text in the introduction has been changed from "LINE-1s only transmit vertically" to "LINE-1s generally transmit vertically with few exceptions", with the addition of the suggested citation.

      8) A column in Table S2 looks mislabelled: Column R should be CHM1 not CHM13?

      • We thank the reviewer for seeing this error. Column P (Column R in the previous version) of Table S2 is now correctly labeled as "CHM1 L1 intactness".

      Geoff Faulkner (University of Queensland)

      Reviewer #3 (Significance (Required)):

      This is a well-executed study of considerable interest to the mobile DNA field, and anyone working with long-read DNA sequencing. Its strengths are the genomic and bioinformatic analysis, leveraging the PacBio long-read data and BAC library available for CHM1 to full effect. One limitation (in current form) is its near-exclusive focus on ORFs to encapsulate how mobile a given L1 allele is, when genomic context and L1 promoter mutations could also contribute heavily. Although I liked the manuscript very much and enjoyed reviewing it, some of the conceptual advances are encroached upon by other work (including some very relevant and yet uncited literature). These issues can very likely be addressed via a revision, additional analyses may be required but not new experiments.

      Geoff Faulkner (University of Queensland)

    1. Author Response

      We would like to thank the reviewers for their positive and constructive comments on the manuscript.

      We are planning the following revisions to both DGRPool and the corresponding manuscript to address the reviewers’ comments:

      1) We agree with reviewer #1 that normalizing the data could potentially improve the GWAS results. Thus, we plan to explore the implementation of this option and assess its impact on the overall results. We will also investigate replacing the ANOVA test with a KRUSKAL test. Instead of upfront data normalization, we will consider using the PLINK –pheno-quantile-normalize option. Both options will be compared on a set of phenotypes where we can analyze the output (i.e., for phenotypes where we expect to find specific variants), to determine whether these strategies enhance the detection power.

      2) We also agree with both reviewers that gene expression information is of interest. However, we recognize that incorporating such information would entail substantial work (as elaborated in our response to comments below). We feel that this extensive work is beyond the current scope of this paper, which primarily focuses on phenotypes and genotype-phenotype associations. Nonetheless, we are committed to enhancing user experience by including more gene-level outlinks to Flybase. Additionally, we will link variants and gene results to Flybase's online genome browser, JBrowse. By following the reviewers' suggestions, we aim to guide DGRPool users to potentially informative genes.

      3) In agreement with reviewer #2, we acknowledge that additional tools could enhance DGRPool's functionality and facilitate meta-analyses for users. Therefore, we are in the process of developing a gene-centric tool that will allow users to query the database based on gene names. Moreover, we intend to integrate ortholog databases into the GWAS results. This feature will enable users to extend Drosophila gene associations to other species if necessary.

      4) Finally, we also concur with both reviewers about making minor edits to the manuscript to address their feedback.

      Reviewer #1 (Public Review):

      This is a technically sound paper focused on a useful resource around the DRGP phenotypes which the authors have curated, pooled, and provided a user-friendly website. This is aimed to be a crowd-sourced resource for this in the future.

      The authors should make sure they coordinate as well as possible with the NC datasets and community and broader fly community. It looks reasonable to me but I am not from that community.

      We thank the reviewer for the positive comments. We are relatively well-connected to the D. melanogaster community and aim to leverage this connection to render the resource as valuable as possible. DGRPool in fact already reflects the input of many potential users and was also inspired by key tools on the DGRP2 website. Furthermore, it also rationalizes why we are often bridging our results with other resources, such as linking out to Flybase, which is the main resource for the Drosophila community at large.

      I have only one major concern which in a more traditional review setting I would be flagging to the editor to insist the authors did on resubmission. I also have some scene setting and coordination suggestions and some minor textual / analysis considerations.

      The major concern is that the authors do not comment on the distribution of the phenotypes; it is assumed it is a continuous metric and well-behaved - broad gaussian. This is likely to be more true of means and medians per line than individual measurements, but not guaranteed, and there could easily be categorical data in the future. The application of ANOVA tests (of the "covariates") is for example fragile for this.

      The simplest recommendation is in the interface to ensure there is an inverse normalisation (rank and then project on a gaussian) function, and also to comment on this for the existing phenotypes in the analysis (presumably the authors are happy). An alternative is to offer a kruskal test (almost the same thing) on covariates, but note PLINK will also work most robustly on a normalised dataset.

      We thank the reviewer for raising this interesting point. Indeed, we did not comment on the distribution of individual phenotypes due to the underlying variability from one phenotype to another, as suggested by the reviewer. Some distributions appear normal, while others are clearly not normally distributed. This information is 'visible' to users by clicking on any phenotype; DGRPool automatically displays its global distribution if the values are continuous/quantitative. We acknowledge the reviewer's concerns regarding the use of ANOVA tests. However, we consider it acceptable to perform linear regression (including ANOVA tests) on non-normally distributed data, as only the prediction errors need to follow a normal distribution.

      Furthermore, the ANOVA test is solely conducted to assess whether any of the potential covariates (such as well-established inversions and symbiont infection status) are associated with the phenotype of interest. PLINK2 automatically corrects for the effects of these covariates during GWAS by considering them as part of the regression model.

      Nevertheless, we concur with the reviewer that normalizing the data could potentially enhance GWAS results. Consequently, we commit to exploring the impact of data normalization on the overall outcomes. Additionally, we will consider replacing the ANOVA test with a KRUSKAL test, and using the PLINK –pheno-quantile-normalize option. We intend to compare both approaches using a set of phenotypes where we can compare the output (i.e., where specific variants are expected to be identified). This comparison will help us determine if either method enhances the detection power.

      Minor points:

      On the introduction, I think the authors would find the extensive set of human GWAS/PheWAS resources useful; widespread examples include the GWAS Catalog, Open Targets PheWAS, MR-base, and the FinnGen portal. The GWAS Catalog also has summary statistics submission guidelines, and I think where possible meta-data harmonisation should be similar (not a big thing). Of course, DRGP has a very different structure (line and individuals) and of course, raw data can be freely shown, so this is not a one-to-one mapping.

      Thank you for the suggestion. We will cite these resources in the Introduction and check the GWAS catalog submission guidelines to compare to the ones we are proposing in this paper.

      For some authors coming from a human genetics background, they will be interpreting correlations of phenotypes more in the genetic variant space (eg LD score regression), rather than a more straightforward correlation between DRGP lines of different individuals. I would encourage explaining this difference somewhere.

      We appreciate this potential issue and we will make this distinction clearer in the manuscript to avoid any confusion.

      This leads to an interesting point that the inbred nature of the DRGP allows for both traditional genetic approaches and leveraging the inbred replication; there is something about looking at phenotype correlations through both these lenses, but this is for another paper I suspect that this harmonised pool of data can help.

      We agree with the reviewer and hope that more meta-analyses will be made possible by leveraging the harmonized data that are made available through DGRPool.

      I was surprised the authors did not crunch the number of transcript/gene expression phenotypes and have them in. Is this because this was better done in other datasets? Or too big and annoying on normalisation? I'd explain the rationale to leave these out.

      This is a very good point raised by the reviewer, and this is in fact something that we initially wanted to do. However, to render the analysis fair and robust, it would require processing all datasets in the same way. This implies cataloging all existing datasets and processing them through the same pipeline. Then, it also requires adding a “cell type” or “tissue” layer, because gene expression data from whole flies is obviously not directly comparable to gene expression data from specific tissues or even specific conditions. This would be key information as phenotypes are often tissue-dependent. So, as implied by the reviewer, we deemed this too big of a challenge beyond the scope of the current paper. Nevertheless, we plan to continue investigating this avenue, especially given the strong transcriptomics background of our lab, in a potential follow-up paper.

      I think 25% FDR is dangerously close to "random chance of being wrong". I'd just redo this section at a higher FDR, even if it makes the results less 'exciting'. This is not the point of the paper anyway.

      We agree with the reviewer that this threshold implies a higher risk of false positive results. However, this is not an uncommonly used threshold (Li et al., PLoS biology, 2008; Bevers et al., Nature Metabolism, 2019; Hwangbo et al, Elife, 2023), and one that seems robust enough in our analysis since similar phenotypes are significant in different studies. Nevertheless, we will revisit these results and explore how a more stringent threshold may impact the results.

      I didn't buy the extreme line piece as being informative. Something has to be on the top and bottom of the ranks; the phenotypes are an opportunity for collection and probably have known (as you show) and cryptic correlations. I think you don't need this section at all for the paper and worry it gives an idea of "super normals" or "true wild types" which ... I just don't think is helpful.

      This section of the paper was intended to investigate anecdotal evidence suggesting that certain DGRP lines consistently rank at the top or bottom when examining fitness-related traits. If accurate, this observation could imply that inbreeding might have made these lines generally weaker, potentially introducing bias into studies aimed at uncovering the genetic basis of complex traits. However, as per the analyses presented, we did not discover support for this phenomenon. Nevertheless, we consider this message important to convey. In response to the reviewer's feedback, we intend to provide a clearer explanation of the reasoning behind this section of the paper and its main conclusion.

      I'd say "well-established inversion genotypes and symbiot levels" rather than generic covariates. Covariates could mean anything. You have specific "covariates" which might actually be the causal thing.

      Thank you. We will update the manuscript accordingly.

      I wouldn't use the adjective tedious about curation. It's a bit of a value judgement and probably places the role of curation in the wrong way. Time-consuming due to lack of standards and best practice?

      Thank you. We will update the manuscript accordingly.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, Gardeux et al provide a web-based tool for curated association mapping results from DRP studies. The tool lets users view association results for phenotypes and compare mean phenotype ~ phenotype correlations between studies. In the manuscript, the authors provide several example utilities associated with this new resource, including pan-study summary statistics for sex, traits, and loci. They highlight cross-trait correlations by comparing studies focused on longevity with phenotypes such as oxphos and activity.

      Strengths:

      -Considerable efforts were dedicated toward curating the many DRG studies provided.

      -Available tools to query large DRP studies are sparse and so new tools present appeal

      Weaknesses:

      The creation of a tool to query these studies for a more detailed understanding of physiologic outcomes seems underdeveloped. These could be improved by enabling usages such as more comprehensive queries of meta-analyses, molecular information to investigate given genes or pathways, and links to other information such as in mouse rat or human associations.

      We appreciate the reviewer's kind comments.

      Regarding the tools, we concur with the reviewer that incorporating additional tools could enhance DGRPool and facilitate users in conducting meta-analyses. Therefore, we intend to introduce a gene-centric tool that enables users to query the database based on gene names. Additionally, we will establish links to ortholog databases within the GWAS results, thereby allowing users to extend fly gene associations to other species, if required.

      Furthermore, we have plans to link out to a 'genome browser-like' view (Flybase’s JBrowse tool) of the GWAS results centered around the affected variants/genes. We are considering integrating this feature into the new gene-centric tool as well.

      Another potential downstream analysis we are considering is gene-set enrichment. This analysis would involve assessing the enrichment of genes in Gene Ontology or other pathway databases directly from the GWAS results page.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the positive feedback of the reviewers and have modified the manuscript to address their comments, including changes to the text, figures, and methods. We believe that these revisions have strengthened and improved the manuscript. Reviewers’ comments in blue and detailed responses in black are below.

      Reviewer #1 Weaknesses:

      • Is "function" of the ISNs to balance "nutrient need" or osmolarity? Balancing hemolymph osmolarity for physiological homeostasis is conceptually different from balancing thirst and hunger.

      We have added the following text to the introduction to address this: “Thus, the ISNs sense both AKH and hemolymph osmolality, arguing that they balance internal osmolality fluctuations and nutrient need (Jourjine, Mullaney et al., 2016).” (ln 80-82).

      • The final schematic nicely sums up how the different peptidergic pathways might work together, but it is unclear which connections are empirically-validated or speculative. It would be informative to show which parts of the model are speculative versus validated. For example, does FAFB volume synapse = functional connectivity and not just anatomical proximity? A bulk of the current manuscript relies on "synapses of relatively high confidence" (according to Materials and methods: line 522). I recommend distinguishing empirically tested & predicted connections in the final schematic, and maybe reword/clarify throughout the manuscript as "predicted synaptic partners"

      We modified the schematic to clarify EM based connections versus functionally validated connections. We also clarified the EM predicted synaptic partners, using “predicted synaptic partners” throughout the manuscript.

      Reviewer #2 Areas for further development:

      • Does BIT inhibit all of the IPCs or some of them? I think it is critical to indicate the ROIs used for each neuron in the methods. Which part of the neuron is used for imaging experiments? Dendrites, cell bodies, or synaptic terminals?

      ROIs used for quantification are described in the figure legends: “ArcLight response of BiT soma…” (Fig 2, Fig S2), “Calcium responses of CCHa2R-RA neurites in SEZ…” (Fig 4), “Calcium response of CCHa2R-RA SEZ neurites…” (Fig S4), “Calcium response of CCAP neurites…” (Fig 5, Fig S5), “Calcium response of all IPC somas…” (Fig S3). We have added ROIs used for quantification to the ‘In vivo calcium imaging’ and the ‘In vivo voltage imaging’ methods sections (ln 493-494).

      • The discussion section is not giving big picture explanation of how these neurons work together to regulate sugar and water ingestion. Silencing and activation experiments are good, but without showing the innate activity of these neural groups during ingestion, it is not clear what their functions are in terms of regulating fly behavior.

      We agree that how these peptidergic neurons coordinately regulate feeding is unclear. As peptide signals may act at a distance and may cause long-lasting neural activity state changes, studying their integration over space and time is challenging. Acute imaging during feeding would only in part address this challenge, as cumulative changes in nutrient need signals may impart circuit changes that are not apparent by monitoring the acute activity of peptidergic neurons. We modified a paragraph in the discussion to address this (ln 434-443).

      “Overall, our work sheds light on neural circuit mechanisms that translate internal nutrient abundance cues into the coordinated regulation of sugar and water ingestion. We show that the hunger and thirst signals detected by the ISNs influence a network of peptidergic neurons that act in concert to prioritize ingestion of specific nutrients based on internal needs. We hypothesize that multiple internal state signals are integrated in higher brain regions such that combinations of peptides and their actions signify specific needs to drive ingestion of appropriate nutrients. As peptide signals may act at a distance and may cause long-lasting neural activity state changes, studying their integration over space and time is a future challenge to further illuminate homeostatic feeding regulation.”

      Reviewer #1 (Recommendations For The Authors):

      • For the final schematic figure, it may be informative to include nanchung and AKHR in the schematic.

      We now include this (Fig 6).

      • For the ingestion duration with optogenetic activation, I don't think the right way to represent the data is by normalizing them to the no LED control. I think it should show raw ingestion time. I understand that the normalized data make the figure "cleaner" (no need to show +/- LED separately) but I think visualization of the raw data is important.

      We now include this in a new Supplemental Figure (Fig S6).

      • Methods for ingestion with optogenetic activation should be detailed in the Methods section.

      We expanded upon this in the ‘Temporal consumption assay (TCA)’ methods section. (ln 461-466).

      Reviewer #2 (Recommendations For The Authors):

      1) I think the authors are not following the recommendations of the Flywire community which recommends that people who contributed to the tracing of neurons are offered authorship in the published papers. I see the authors are thanking other lab members who have done tracing for the neurons described in this study, but I would like them to clarify whether they are following the guidelines provided by Flywire.

      We followed the Flywire guidelines and contacted all Flywire users contributing more that 10% to neuron edits for permission to publish with acknowledgements. (see Flywire guidelines https://docs.google.com/document/d/1bUkOB5JnT3u__JDvAoVDHJ3zr5NXQtV_63yx2w6Tcc/edit).

      2) The method section for voltage imaging is missing.

      We now include a section on voltage imaging (ln 496-498).

      3) ROIs for imaging are not indicated in the methods or in the figures. It is hard to judge what is the origin of neural activity plotted in the figures; are they imaging cell bodies, dendrites, or axons?

      ROIs used for quantification are described in the figure legends: “ArcLight response of BiT soma…” (Fig 2, Fig S2), “Calcium responses of CCHa2R-RA neurites in SEZ…” (Fig 4), “Calcium response of CCHa2R-RA SEZ neurites…” (Fig S4), “Calcium response of CCAP neurites…” (Fig 5, Fig S5), “Calcium response of all IPC somas…” (Fig S3). We have added ROIs used for quantification to the ‘In vivo calcium imaging’ and the ‘In vivo voltage imaging’ methods sections (ln 493-494).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would first like to thank the reviewers and the editor for their insightful comments and suggestions. We are particularly glad to read that our so<ware package constitutes a set of “well-written analysis routines” which have “the potential to become very valuable and foundational tools for the analysis of neurophysiological data”. We have updated the manuscript to address their remarks where appropriate.

      Additionally, we would like to stress that this kind of tools is in continual development. As such, the manuscript offered a snapshot of the package at one point during this process, which in this case was several months ago at initial submission. Since then, several improvements were implemented. The manuscript has been further updated to reflect these more recent changes.

      From the Reviewing Editor:

      The reviewers identified a number of fundamental weaknesses in the paper.

      1) For a paper demonstrating a toolbox, it seems that some example analyses showing the value of the approach (and potentially the advantage in simplification, etc over previous or other approaches) are really important to demonstrate.

      As noted by the first reviewer, the online repository (i.e. GitHub page) conveys a better sense of the toolboxes’ contribution to the field than the present manuscript. This is a fair remark but at the same time, it is unclear how to illustrate this in a journal article without dedicating a great deal of page space to presenting raw code, while online tools offer an easier and clearer way to do this. As a work-around, our strategy was to illustrate some examples of data analysis in Figures 4&5 by comparing each illustrated processing step to the corresponding command line used by the Pynapple package. Each step requires a single line of code, meaning that one only needs to write three lines of code to decode a feature from population activity using a Bayesian decoder (Fig. 4a), compute a cross-correlograms of two neurons during specific stimulus presentation (Fig. 4b) or compute the average firing rate of two neurons around a specific time of the experimental task (Fig. 4c). We believe that these visual aides make it unnecessary to add code in the main text of this manuscript. However, to aid reader understanding, we now provide clear references to online Jupyter notebooks which show how each figure was generated in figure legends as well as in the “Code Availability” section.

      https://github.com/pynapple-org/pynapple-paper-2023

      Furthermore, we have opted-in for the “Executable Research Articles” feature at eLife, which will make it possible to include live scripts and figures in the manuscript once it is accepted for publication. We do not know at this stage what it entails exactly, but we hope that Figures 4&5 will become live with this feature. The readers will have the possibility to see and edit the code directly within the online version of the manuscript.

      2) The manuscript's claims about not having dependencies seem confusing.

      We agree that this claim was somewhat unfounded. There are virtually no Python packages that do not have dependencies. Our intention was to say that the package had no dependencies outside the most common ones, which are Numpy, Scipy, and Pandas. Too many packages in the field tend to have long list of dependencies making long-term back-compatibility quite challenging. By keeping depencies minimal, we hope to maximise the package’'s long term back-compatibility. We have rephrased this statement in the manuscript in the following sections:

      Figure 1, legend.

      “These methods depend only on a few, commonly used, external packages.”

      Section Foundational data processing: “they are for the most part built-in and only depend on a few widely-used external packages. This ensures that the package can be used in a near stand-alone fashion, without relying on packages that are at risk of not being maintained or of not being compatible in the near future.”

      3) Given its significant relevance, it seems important to cite the FMATool and describe connections between it (or analyses based on it) and the presented work.

      Indeed, although we had already cited other toolboxes (including a review covering the topic comprehensively), we should have included this one in the original manuscript. Unfortunately, to the best of our knowledge, this toolbox is not citable (there is no companion paper). We have added a reference to it in plain text.

      4) Some discussion of integration between Pynapple and the rest of a full experimental data pipeline should be discussed with regard to reproducibility.

      This is an interesting point, and the third paragraph of the discussion somewhat broached this issue. Pynapple was not originally designed to pre-process data. However, it can, in theory, load any type of data streams a<er the necessary pre-processing steps. Overall, modularity is a key aspect of the Pynapple framework, and this is also the case for the integration with data pre-processing pipelines, for example spike sorting in electrophysiology and detection of region of interest in calcium imaging. We do not think there should be an integrated solution to the problem but, instead, to make it possible that any piece of code can be used for data irrespective of their origin. This is why we focused on making data loading straightforward and easy to adapt to any particular situation. To expand on this point and make it clear that Pynapple is not meant to pre-process data but can, in theory, load any type of data streams a<er the necessary pre-processing steps, we have added the following sentences to the aforementioned paragraph:

      “Data in neuroscience vary widely in their structure, size, and need for pre-processing. Pynapple is built around the idea that raw data have already been pre-processed (for example, spike sorting and detection of ROIs).”

      5) Relatedly, a description of how data are stored a<er processing (i.e., how precisely are processed data stored in NWB format).

      We agree that this is a critical issue. NWB is not necessarily the best option as it is not possible to overwrite in a NWB file. This would require the creation of a new NWB file each time, which is computationally expensive and time consuming. It also further increases the odds of writing error. Theoretically, users who needs to store intermediate results in a flexible way could use any methods they prefer, writing their own data files and wrappers to reload these data into Pynapple objects. Indeed, it is not easy to properly store data in an object-specific manner. This is a long-standing issue and one we are currently working to resolve.

      To do so, we are developing I/O methods for each Pynapple core objects. We aim to provide an output format that is simple to read and backward compatible in future Pynapple releases. This feature will be available in the coming weeks. To note, while NWB may not be the central data format of Pynapple in future releases, it has become a central node in the neuroscience ecosystem of so<ware. Therefore, we aim to facilitate the interaction of users with reading and writing for this format by developing a set of simple standalone functions.

      Reviewer #1 (Public Review):

      A typical path from preprocessed data to findings in systems neuroscience o<en includes a set of analyses that o<en share common components. For example, an investigator might want to generate plots that relate one time series (e.g., a set of spike times) to another (measurements of a behavioral parameter such as pupil diameter or running speed). In most cases, each individual scientist writes their own code to carry out these analyses, and thus the same basic analysis is coded repeatedly. This is problematic for several reasons, including the waste of time, the potential for errors, and the greater difficulty inherent in sharing highly customized code.

      This paper presents Pynapple, a python package that aims to address those problems.

      Strengths:

      The authors have identified a key need in the community - well-written analysis routines that carry out a core set of functions and can import data from multiple formats. In addition, they recognized that there are some common elements of many analyses, particularly those involving timeseries, and their object- oriented architecture takes advantage of those commonalities to simplify the overall analysis process.

      The package is separated into a core set of applications and another with more advanced applications, with the goal of both providing a streamlined base for analyses and allowing for implementations/inclusion of more experimental approaches.

      Weaknesses:

      There are two main weaknesses of the paper in its present form.

      First, the claims relating to the value of the library in everyday use are not demonstrated clearly. There are no comparisons of, for example, the number of lines of code required to carry out a specific analysis with and without Pynapple or Pynacollada. Similarly, the paper does not give the reader a good sense of how analyses are carried out and how the object-oriented architecture provides a simplified user interaction experience. This contrasts with their GitHub page and associated notebooks which do a better job of showing the package in action.

      As noted in the response to the Reviewing Editor and response to the reviewer’s recommendation to the authors below, we have now included links to Jupyter notebooks that highlight how panels of Figures 4 and 5 were generated (https://github.com/pynapple-org/pynapple-paper-2023). However, we believe that including more code in the manuscript than what is currently shown (I.e. abbreviated call to methods on top of panels in Figs 4&5) would decrease the readability of the manuscript.

      Second, the paper makes several claims about the values of object-oriented programming and the overall design strategy that are not entirely accurate. For example, object-oriented programming does not inherently reduce coding errors, although it can be part of good so<ware engineering. Similarly, there is a claim that the design strategy "ensures stability" when it would be much more accurate to say that these strategies make it easier to maintain the stability of the code. And the authors state that the package has no dependencies, which is not true in the codebase. These and other claims are made without a clear definition of the properties that good scientific analysis so<ware should have (e.g., stability, extensibility, testing infrastructure, etc.).

      Following thFMAe reviewer’s comment, we have rephrased and clarified these claims. We provide detailed response to these remarks in the recommendations to authors below.

      There is also a minor issue - these packages address an important need for high-level analysis tools but do not provide associated tools for preprocessing (e.g., spike sorting) or for creating reproducible pipelines for these analyses. This is entirely reasonable, in that no one package can be expected to do everything, but a bit deeper account of the process that takes raw data and produces scientific results would be helpful. In addition, some discussion of how this package could be combined with other tools (e.g., DataJoint, Code Ocean) would help provide context for where Pynapple and Pynacollada could fit into a robust and reliable data analysis ecosystem.

      We agree the better explaining how Pynapple is integrated within data preprocessing pipelines is essential. We have clarified this aspect in the manuscript and provide more details below.

      Reviewer #1 (Recommendations For The Authors):

      Page 1

      • Title

      The authors should note that the application name- "Pynapple" could be confused with something from Apple. Users may search for "Pyapple" as many python applications contain "py" like "Numpy". "Pyapple" indeed is a Python Apple that works with Apple products. They could consider "NeuroFrame", "NeuroSeries" or "NeuroPandas" to help users realize this is not an apple product.

      We thank the referee for this interesting comment. However, we are not willing to make such change at this point. The community of users has been growing in the last year and it seems too late to change the name. To note, it is the first time such comment is made to us and it does not seem that users and collaborators are confused with any Apple products.

      • Abstract

      The authors mentioned that the Pynapple is "fully open source". It may be better to simply say it is "open source".

      We agree, corrected.

      Assuming the authors keep the name, it would be helpful if the full meaning of Pynapple - Python Neural Analysis Package was presented as early as possible.

      Corrected in the abstract.

      • Highlight

      An application being lightweight and standalone does not imply nor ensure backward compatibility. In general, it would be useful if the authors identified a set of desirable code characteristics, defined them clearly in the introduction, and then describe their so<ware in terms of those characteristics.

      Thank you for your comment. We agree that being lightweight and standalone does not necessarily imply backward compatibility. Our intention was to emphasize that Pynapple is designed to be as simple and flexible as possible, with a focus on providing a consistent interface for users across different versions. However, we understand that this may not be enough to ensure long-term stability, which is why we are committed to regular updates and maintenance to ensure that the code remains functional as the underlying code base (Python versions, etc.) changes.

      Regarding your suggestion to identify a set of desirable code characteristics, we believe this is an excellent idea. In the introduction, we briefly touch upon some of the core principles that guided our development of Pynapple: a lightweight, stable, and simple package. However, we acknowledge that providing a more detailed discussion of these characteristics and how they relate to the design of our so<ware would be useful for readers. We have added this paragraph in the discussion:

      “Pynapple was developed to be lightweight, stable, and simple. As simplicity does not necessarily imply backward compatibility (i.e. long-term stability of the code), Pynapple main objects and their properties will remain the same for the foreseeable future, even if the code in the backend may eventually change (e.g. not relying on Pandas in future version). The small number of external dependencies also decrease the need to adapt the code to new versions of external packages. This approach favors long-term backward compatibility.”

      Page 2

      • The authors wrote -

      "Despite this rapid progress, data analysis o<en relies on custom-made, lab-specific code, which is susceptible to error and can be difficult to compare across research groups."

      It would be helpful to add that custom-made, lab-specific code can lead to a violation of FAIR principles (https://en.wikipedia.org/wiki/FAIR_datadata). More generally, any package can have errors, so it would be helpful to explain any testing regiments or other approach the authors have taken to ensure that their code is error-free.

      We understand the importance of the FAIR principles for data sharing. However, Pynapple was not designed to handle data through their pre-processing. The only aspect that is somehow covered by the FAIR principles is the interoperability, but again, it is a requirement for the data to interoperate with different storage and analysis pipelines, not of the analysis framework itself. Unlike custom-made code, Pynapple will make interoperability easier, as, in theory, once the required data loaders are available, any analysis could be run on any dataset. We have added the following sentence to the discussion:

      “Data in neuroscience vary widely in their structure, size, and need for pre-processing. Pynapple is built around the idea that raw data has already been pre-processed (for example, spike sorting and ROI detection). According to the FAIR principles, pre-processed data should interoperate across different analysis pipelines. Pynapple makes this interoperability possible as, once the data are loaded in the Pynapple framework, the same code can be used to analyze different datasets”

      • The authors wrote -

      "While several toolboxes are available to perform neuronal data analysis ti–11,2ti (see ref. 29 for review), most of these programs focus on producing high-level analysis from specified types of data and do not offer the versatility required for rapidly-changing analytical methods and experimental methods."

      Here it would be helpful if the authors could give a more specific example or explain why this is problematic enough to be a concern. Users may not see a problem with high-level analysis or using specific data types.

      Again, we apologize for not fully elaborating upon our goals here. Our intention was to point out that toolboxes o<en focus on one particular case of high-level analysis. In many cases, such packages lack low level analysis features or the flexibility to derive new analysis pipelines quickly and effortlessly. Users can decide to use low-level packages such as Pandas, but in that case, the learning curve can be steep for users with low, if any, computational background. The simplicity of Pynapple, and the set of examples and notebooks, make it possible for individuals who start coding to be quickly able to analyze their data.

      As we do not want to be too specific at this point of the manuscript (second paragraph of the intro) and as we have clarified many of the aspects of the toolbox in the new revised version, we have only added the following sentence to the paragraph:

      “Users can decide to use low-level data manipulation packages such as Pandas, but in that case, the learning curve can be steep for users with low, if any, computational background.”

      • The authors wrote -

      "To meet these needs, a general toolbox for data analysis must be designed with a few principles in mind"

      Toolboxes based on many different principles can solve problems. It is likely more accurate to say that the authors designed their toolbox with a particular set of principles in mind. A clear description of those principles (as mentioned in the comment above) would help the reader understand why the specific choices made are beneficial.

      We agree that these are not “universal” principles and clearly more the principles we had in mind when we designed the package. We have clarified these principles and made clear that these are personal point of views.

      We have rephrased the following paragraph:

      “To meet these needs, we designed Pynapple, a general toolbox for data analysis in systems Neuroscience with a few principles in mind.“

      • The authors wrote -

      "The first property of such a toolbox is that it should be object-oriented, organizing so<ware around data."

      What facts make this true? For example, React is a web development library. A common approach to using this library is to use Hooks (essentially a collection of functions). This is becoming more popular than the previous approach of using Components (a collection of classes). This is an example of how Object-oriented programming is not always the best solution. In some cases, for example, object- oriented coding can cause problems (e.g. it can be hard to find the place where a given function is defined and to figure out which version is being used given complex inheritance structures.)

      In general, key selling points of object-oriented programming are extension, inheritance, and encapsulation. If the authors want to retain this text (which would be entirely reasonable), it would be helpful if they explained clearly how an object-oriented approach enables these functions and why they are critical for this application in particular.

      The referee makes a particularly important point. We are aware of the limits of OOP, especially when these objects become over-complex, and that the inheritance become unclear.

      We have clarified our goal here. We believe that in our case, OOP is powerful and, overall, is less error- prone that a collection of functions. The reasons are the following:

      An object-oriented approach facilitates better interactions between objects. By encapsulating data and behavior within objects, object-oriented programming promotes clear and well-defined interfaces between objects. This results in more structured and manageable code, as objects communicate with each other through these well-defined interfaces. Such improved interactions lead to increased code reliability.

      Inheritance, a key concept in object-oriented programming, allows for the inheritance of properties. One important example of how inheritance is crucial in the Pynapple framework is the time support of Pynapple objects. It determines the valid epoch on which the object is defined. This property needs to be carried over during different manipulations of the object. Without OOP, this property could easily be forgotten, resulting in erroneous conclusions for many types of analysis. The simplest case is the average rate of a TS object: the rate must be computed on the time support ( a property of TS objects), not the beginning to the end of the recording (or of a specific epoch, independent of the TS). Finally, it is easier to access and manipulate the meta information of a Pynapple object than without using objects.

      • The authors wrote -

      "drastically diminishing the odds of a coding error"

      This seems a bit strong here. Perhaps "reducing the odds" would be more accurate.

      We agree. Now changed.

      Page 3

      • The authors wrote -

      ". Another property of an efficient toolbox is that as much data as possible should be captured by only a small number of objects This ensures that the same code can be used for various datasets and eliminates the need of adapting the structure"

      It may be better to write something like - "Objects have a collection of preset variables/values that are well suited for general use and are very flexible." Capturing "as much data as possible" may be confusing, because it's not the amount that this helps with but rather the variety.

      We thank the referee for this remark. We have rephrased this sentence as follows:

      “Another property of an efficient toolbox is that a small number of objects could virtually represents all possible data streams in neuroscience, instead of objects made for specific physiological processes (e.g. spike trains).”

      • The authors wrote -

      "The properties listed above ensure the long-term stability of a toolbox, a crucial aspect for maintaining the code repository. Toolboxes built around these principles will be maximally flexible and will have the most general application"

      There are two issues with this statement. First, ensuring long-term stability is only possible with a long- term commitment of time and resources to ensure that that code remains functional as the underlying code base (python versions, etc.) changes. If that is something you are commisng to, it would be great to make that clear. If not, these statements need to be less firm.

      Second, it is not clear how these properties were arrived at in the first place. There are things like the FAIR Principles which could provide an organizing framework, ideally when combined with good so<ware engineering practices, and if some more systematic discussion of these properties and their justification could be added, it would help the field think about this issue more clearly.

      The referee makes a valid point that ensuring long-term stability requires a long-term commitment of time and resources to maintain the code as the underlying technology evolves. While we cannot make guarantees about the future of Pynapple, we believe that one of the best ways to ensure long-term stability is by fostering a strong community of users and contributors who can provide ongoing support and development. By promoting open-source collaboration and encouraging community involvement, we hope to create a sustainable ecosystem around Pynapple that can adapt to changes in technology and scientific practices over time. Ultimately, the longevity of any scientific tool depends on its adoption and use by the research community, and we hope that Pynapple can provide value to neuroscience researchers and continue to evolve and improve as the field progresses.

      It is noteworthy that the first author, and main developer of the package, has now been hired as a data scientist at the Center for Computational Neuroscience, Flatiron Institute, to explicitly continue the development of the tool and build a community of users and contributors.

      • The authors wrote -

      "each with a limited number of methods..."

      This may give the impression that the functionality is limited, so rephrasing may be helpful.

      Indeed! We have now rephrased this sentence:

      “The core of Pynapple is five versatile timeseries objects, whose methods make it possible to intuitively manipulate and analyze the data.”

      • The authors wrote that object-oriented coding

      "limits the chances of coding error"

      This is not always the case, but if it is the case here, it would be helpful if the authors explain exactly how it helps to use object-oriented approaches for this package.

      We agree with the referee that it is not always the case. As we explained above, we believe it is less error-prone that a collection of functions. Quite o<en, it also makes it easier to debug. We have changed this sentence with the following one:

      “Because objects are designed to be self-contained and interact with each other through well-defined methods, users are less likely to make errors when using them. This is because objects can enforce their own internal consistency, reducing the chances of data inconsistencies or unexpected behavior. Overall, OOP is a powerful tool for managing complexity and reducing errors in scientific programming.”

      • Fig 1

      In object-oriented programming, a class is a blueprint for the classes that inherit it. Instantiating that<br /> class creates an object. An object contains any or all of these - data, methods, and events. The figure could be improved if it maintained these organizational principles as figure properties.

      We agree with the referee’s remark regarding the logic of objects instantiation but how this could be incorporated in Fig. 1 without making it too complex is unclear. Here, objects are instantiated from the first to the second column. We have not provided details about the parent objects, as we believe these details are not important for reader comprehension. In its present form, the objects are inherited from Pandas objects, but it is possible that a future version is based on something else. For the users, this will be transparent as the toolbox is designed in such a way that only the methods that are specific to Pynapple are needed to do most computation, while only expert programmers may be interested in using Pandas functionalities.

      • The authors wrote that Pynapple does -

      "not depend on any external package"

      As mentioned above, this is not true. It depends on Numpy and likely other packages, and this should be explained. It is perfectly reasonable to say that it depends on only a few other packages.

      As said above, we have now clarified this claim.

      Page 5.

      • The authors wrote -

      "represent arrays of Ts and Tsd"

      For a knowledgeable reader's reference, it would be helpful to refer to these either as Numpy arrays (at least at first when they are defined) or as lists if they are native python objects.

      Indeed, using the word “arrays” here could be confusing because of Numpy arrays. We have changed this term with “groups”.

      • The authors wrote -

      "Pynapple is built with objects from the Pandas library ... Pynapple objects inherit the computational stability and flexibility"

      Here a definition of stability would be useful. Is it the case that by stability you mean "does not change o<en"? Or is some other meaning of stability implied?

      Yes, this is exactly what we meant when referring to the stability of Pandas. We have added the following precision:

      “As such, Pynapple objects inherit the long-term consistency of the code and the computational flexibility computational stability and flexibility from this widely used package.”

      Page 6

      • Fig 2

      In Fig 2 A and B, the illustrations are good. It would also be very helpful to use toy code examples to illustrate how Pynapple will be used to carry out on a sample analysis-problem so that potential users can see what would need to be done.

      We appreciate the kind works. Regarding the toy code, this is what we tried to do in Fig. 4. Instead of including the code directly in the paper, which does not seem a modern way of doing this, we now refer to the online notebooks that reproduce all panels of Figure 4.

      • The authors wrote -

      "While these objects and methods are relatively few"

      In object-oriented programming, objects contain methods. If a method is not in an object, it is not technically a method but a function. It would be helpful if the authors made sure their terminology is accurate, perhaps by saying something like "While there are relatively few objects, and while each object has relatively few methods ... "

      We agree with the referee, we have changed the sentence accordingly.

      • The authors wrote -

      "if not implemented correctly, they can be both computationally intensive and highly susceptible to user error"

      Here the authors are using "correctly" to refer to two things - "accuracy" - gesng the right answer, and "efficiency" - gesng to that answer with relatively less computation. It would be clearer if they split out those two concepts in the phrasing.

      Indeed, we used the term to cover both aspects of the problem, leading to the two possible issues cited in the second part of the sentence. We have changed the sentence following the referee’s advice:

      “While there are relatively few objects, and while each object has relatively few methods, they are the foundation of almost any analysis in systems neuroscience. However, if not implemented efficiently, they can be computationally intensive and if not implemented accurately, they are highly susceptible to user error.”

      • In the next sentence the authors wrote -

      "Pynapple addresses this concern."

      This statement would benefit from just additional text explaining how the concern is addressed.

      We thank the referee for the suggestion. We have changed the sentence to this one: “The implementation of core features in Pynapple addresses the concerns of efficiency and accuracy”

      Page 9

      • The authors wrote -

      This is implemented via a set of specialized object subclasses of the BaseLoader class. To avoid code redundancy, these I/O classes inherit the properties of the BaseLoader class. "

      From a programming perspective, the point of a base class is to avoid redundancy, so it might be better to just mention that this avoids the need to redefine I/O operations in each class.

      We have rephrased the sentence as follows:

      “This is implemented via a set of specialized object subclasses of the BaseLoader class, avoiding the need to redefine I/O operations in each subclass"

      • The authors wrote -

      "classes are unique and independent from each other, ensuring stability"

      How do classes being unique and independent ensure stability? Perhaps here again the misunderstanding is due to the lack of a definition of stability.

      We thank the referee for the remark. We first changed “stability” for “long-term backward compatibility”. We further added the following sentence to clarify this claim. “For instance, if the spike sorting tool Phy changes its output in the future, this would not affect the “Neurosuite” IO class as they are independent of each other. This allows each tool to be updated or modified independently, without requiring changes to the other tool or the overall data format.”

      • The authors wrote -

      "Using preexisting code to load data in a specific manner instead of rewriting already existing functions avoids preprocessing errors"

      Here it might be helpful to use the lingo of Object-oriented programming. (e.g. inheritance and polymorphism). Defining these terms for a neuroscience audience would be useful as well.

      We do not think it is necessary to use too much technical term in this manuscript. However, this sentence was indeed confusing. We have now simplified it:

      “[…], users can develop their own custom I/O using available template classes. Pynapple already includes several of such templates and we expect this collection to grow in the future.”

      Page 10

      • The authors wrote -

      "These analyses are powerful because they are able to describe the relationships between time series objects while requiring the fewest number of parameters to be set by the user."

      It is not clear that this makes for a powerful analysis as opposed to an easy-to-use analysis.

      We have changed “powerful” with “easy to use".

      Page 12

      "they are built-in and thus do not have any external dependencies"

      If the authors want to retain this, it would be helpful to explain (perhaps in the introduction) why having fewer external dependencies is useful. And is it true that these functions use only base python classes?

      We have rephrased this sentence as follows:

      “they are for the most part built-in and only depend on a few common external packages, ensuring that they can be used stand-alone without relying on packages that are at risk of not being maintained or of not being compatible in the near future.”

      Other comments:

      • It would be helpful, as mentioned in the public review, to frame this work in the broader context of what is needed to go from data to scientific results so that people understand what this package does and does not provide.

      We have added the following sentence to the discussion to make sure readers understand:

      “The path from data collection to reliable results involves a number of critical steps: exploratory data analysis, development of an analysis pipeline that can involve custom-made developed processing steps, and ideally the use of that pipeline and others to replicate the results. Pynapple provides a platform for these steps.”

      • It would also be helpful to describe the Pynapple so<ware ecosystem as something that readers could contribute to. Note here that GNU may not be a good license. Technically, GNU requires any changes users make to Pynapple for their internal needs to be offered back to the Pynapple team. Some labs may find that burdensome or unacceptable. A workaround would be to have GNU and MIT licenses.

      The main restriction of the GPL license is that if the code is changed by others and released, a similar license should be used, so that it cannot become proprietary. We therefore stick to this choice of license.

      We would be more than happy to receive contributions from the community. To note, several users outside the lab have already contributed. We have added the following sentence in the introduction:

      “As all users are also invited to contribute to the Pynapple ecosystem, this framework also provides a foundation upon which novel analyses can be shared and collectively built by the neuroscience community.”

      • This so<ware shares some similarities with the nelpy package, and some mention of that package would be appropriate.

      While we acknowledge the reviewer's observation that Nelpy is a similar package to Pynapple, there are several important differences between the two.

      First, Nelpy includes predefined objects such as SpikeTrain, BinnedSpikeTrain, and AnalogSignal, whereas Pynapple would use only Ts and Tsd for those. This design choice was made to provide greater flexibility and allow users to define their own data structures as needed.

      Second, Nelpy is primarily focused on electrophysiology data, whereas Pynapple is designed to handle a wider range of data types, including calcium imaging and behavioral data. This reflects our belief that the NWB format should be able to accommodate diverse experimental paradigms and modalities.

      Finally, while Nelpy offers visualization and high-level analysis tools tailored to electrophysiology, Pynapple takes a more general-purpose approach. We believe that users should be free to choose their own visualization and analysis tools based on their specific needs and preferences.

      The package has now been cited.

      Reviewer #2 (Public Review):

      Pynapple and Pynacollada have the potential to become very valuable and foundational tools for the analysis of neurophysiological data. NWB still has a steep learning curve and Pynapple offers a user- friendly toolset that can also serve as a wrapper for NWB.

      The scope of the manuscript is not clear to me, and the authors could help clarify if Pynacollada and other toolsets in the making become a future aspect of this paper (and Pynapple), or are the authors planning on building these as separate publications.

      The author writes that Pynapple can be used without the I/O layer, but the author should clarify how or if Pynapple may work outside NWB.

      Absolutely. Pynapple can be used for generic data analysis, with no requirement of specific inputs nor NWB data. For example, the lab is currently using it for a computational project in which the data are loaded from simple files (and not from full I/O functions as provided in the toolbox) for further analysis and figure generation.

      This was already noted in the manuscript, last paragraph of the section “Importing data from common and custom pipelines”

      “Third, users can still use Pynapple without using the I/O layer of Pynapple.”.

      We have added the following sentence in the discussion

      “To note, Pynapple can be used without the I/O layer and independent of NWB for generic, on-the-fly analysis of data.”

      This brings us to an important fundamental question. What are the advantages of the current approach, where data is imported into the Ts objects, compared to doing the data import into NWB files directly, and then making Pynapple secondary objects loaded from the NWB file? Does NWB natively have the ability to store the 5 object types or are they initialized on every load call?

      NWB and Pynapple are complimentary but not interdependent. NWB is meant to ensure long-term storage of data and as such contains a as much information as possible to describe the experiment. Pynapple does not use NWB to directly store the objects, however it can read from NWB to organize the data in Pynapple objects. Since the original version of this manuscript was submitted, new methods address this. Specifically, in the current beta version, each object now has a “save” method. Obviously, we are developing functions to load these objects as well. This does not depend on NWB but on npz, a Numpy specific file format. However, we believe it is a bit too premature to include these recent developments in the manuscript and prefer not to discuss this for now.

      Many of these functions and objects have a long history in MATLAB - which documents their usefulness, and I believe it would be fisng to put further stress on this aspect - what aspects already existed in MATLAB and what is completely novel. A widely used MATLAB toolset, the FMA toolbox (the Freely moving animal toolbox) has not been cited, which I believe is a mistake.

      We agree that the FMA toolbox should have been cited. This ha now been corrected.

      Pynapple was first developed in Matlab (it was then called TSToolbox). The first advantage is of course that Python is more accessible than Matlab. It has also been adopted by a large community of developers in data analysis and signal processing, which has become without a doubt much larger than the Matlab community, making it possible to find solutions online for virtually any problem one can have. Furthermore, in our experience, trainees are now unwilling to get training in Matlab.

      Yet, Python has drawbacks, which we are fully aware of. Matlab can be very computationally efficient, and old code can usually run without any change, even many years later.

      A limitation in using NWB files is its standardization with limited built-in options for derived data and additional metadata. How are derived data stored in the NWB files?

      NWB has predetermined a certain number of data containers, which are most common in systems neuroscience. It is theoretically possible to store any kind of data and associated metadata in NWB but this is difficult for a non-expert user. In addition, NWB does not allow data replacement, making is necessary to rewrite a whole new NWB file each time derived data are changed and stored. Therefore, we are currently addressing this issue as described above. Derived data and metadata will soon be easy to store and read.

      How is Pynapple handling an existing NWB dataset, where spikes, behavioral traces, and other data types have already been imported?

      This is an interesting point. In theory, Pynapple should be able to open a NWB file automatically, without providing much information. In fact, it is challenging to open a NWB file without knowing what to look for exactly and how the data were preprocessed. This would require adapting a I/O function for a specific NWB file. Unfortunately, we do not believe there is a universal solution to this problem. There are solutions being developed by others, for example NWB Widgets (NWB Widgets). We will keep an eye on this and see whether this could be adapted to create a universal NWB loader for Pynapple.

      Reviewer #2 (Recommendations For The Authors):

      Other tools and solutions are being developed by the NWB community. How will you make sure that these tools can take advantage of Pynapple and vice versa?

      We recognize the importance of collaboration within the NWB community and are committed to making sure that our tools can integrate seamlessly with other tools and solutions developed by the community.

      Regarding Pynapple specifically, we are designing it to be modular and flexible, with clear APIs and documentation, so that other tools can easily interface with it. One important thing is that we want to make sure Pynapple is not too dependent of another package or file format such as NWB. Ideally, Pynapple should be designed so that it is independent of the underlying data storage pipeline.

      Most of the tools that have been developed in the NWB community so far were designed for data visualisation and data conversion, something that Pynapple does not currently address. Multiple packages for behavioral analysis and exploration of electro/optophysiological datasets are compatible with the NWB format but do not provide additional solutions per se. They are complementary to Pynapple.

    1. Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalecent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes.

      Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript by Lan et al. addresses the still incompletely resolved question as to how branching morphogenesis of the embryonic mammary epithelium is regulated at the molecular and cellular level. Using (combinatorial) primary explant cultures of wildtype and genetically engineered mouse embryos, in which the authors have developed a unique expertise over many years, together with imaging and RNAseq analyses, they (i) show that the timing of epithelial branching is dictated by the biological age of the epithelium, but that an epithelial-mesenchymal interaction is required to bestow branching ability on the mammary epithelium somewhere between E13.5 and E16.5, (ii) seek to determine if and how lineage and cell proliferation affect branching, (iii) show that while salivary mesenchyme can promote growth (i.e. branching density) of the E16.5 mammary epithelium, the mode of branching (i.e. lateral branching vs tip-clefting) is an intrinsic property of the mammary epithelium, (iv) use transcriptomics to identify genes that are likely to control either mammary- or salivary gland specific growth and/or branching patterns, (v) hypothesize that low levels of WNT signaling in the mammary gland mesenchyme (due to relatively high expression of WNT signaling inhibitors) are responsible for mammary specific branching, (vi) show that hyperactivation of WNT/CTNNB1 signaling in the mesenchyme indeed induces hyperbranching, (vii) identify Eda and Igf1 as putative mediators and paracrine signaling factors that regulate branching of the mammary epithelium upon secretion from the mesenchyme downstream of WNT/CTNNB1 signaling and (viii) show that mammary gland branching is impaired in Igfr1 null embryos.

      Major comments: 1. Overall, this is a solid study that is well controlled and technically of high quality. The materials and methods should allow follow up and replication by others and the transcriptomic data have been made available via NCBI GEO. I think the authors convincingly demonstrate points (i), (iii), (iv) and (vi) and (viii). I have some questions regarding (ii), (v) and (vii) and (viii) that I will pose below.

      Our response:

      We thank the reviewer for the careful assessment and recognition of our work. In the subsequent sections, we have tried to address all the concerns raised by the reviewer.

      Re: (ii): The authors try to study the link between basal cell fate and branching. They use position of the cells (which they describe clearly and which is a good choice), since they cannot use specific markers due to the fact that the basal and luminal linages have not yet segregated at this point. This part of the manuscript is not the most straightforward to follow. The most obvious experiment would have been to focus on the location of the cells and their associated cell cycle profile - but the authors themselves have just recently published a pre-print (their REF #54, now also out in JCB) that is an in-depth study of the link between cell proliferation + cell motility and branching, but this only becomes apparent in the discussion. In that sense, Fig2 of the current manuscript is less novel, although it is nice to see that it holds up in a slightly different analysis.

      Our response:

      We thank the reviewer for acknowledging our recently published work, which is focusing on the active branching phase during late embryogenesis/around birth. In the current proliferation analysis, however, our focus was on a different aspect of embryonic mammary gland development: understanding the mechanism underlying the ability to acquire competence to branch, i.e. how the epithelium changes between late bud and sprout stages. Our data obtained from tissue recombination and 3D culture experiments suggest that heterotypic mesenchymes or mesenchyme-free 3D organoid culture conditions do not provide sufficient signals to support branching of mammary epithelia before E16.5. We have rephrased the text to better emphasize this point.

      Instead of focusing on the cell cycle markers, the authors turn to a K14-Eda mouse model - which shows precocious branching and a temporary reduction in K8 expression. They also analyze Eda-KO embryos. Quite frankly, I find the authors' reasoning difficult to follow here and I cannot deduce how these experiments really address the question at hand (i.e. how lineage and cell proliferation affect branching), so I hope they can rewrite this section of the paper to make the arguments more clear and easy to follow for the reader who, at this point, knows little about Eda. For example, the authors present the argument that K14-Eda mice show a transient reduction in K8 expression - but we don't know if that also really means a (temporary?) change in (future?) luminal cell fate. In fact, since Eda later also makes an appearance as a candidate factor to be secreted by the mesenchyme together with Igf1, I wonder if their K14-Eda data would not be better suited to underscore that point instead and if the authors should perhaps eliminate this section altogether and just refer to their prior work in REF #45. If the authors think the current data add something more, than they need to be more explicit about this (and then also introduce the link to REF #45 in the results section).

      __Our response: __

      We agree with all the reviewers in that this part of the manuscript was not mature enough and provided only indirect evidence on the potential link between lineage segregation and branching ability. This is an important question in the field that merits a study of its own and should be addressed with better tools than those available to us at present. As suggested by reviewers #1 and #3, we have omitted this part in the revised manuscript.

      Re: (v): Do the authors have any WNT/CTNNB1 target genes that they can include in their transcriptomics analysis to show that the WNT/CTNNB1 signaling levels are indeed lower in the mammary mesenchyme? Axin2 comes to mind, but there are some other negative feedback targets that are often induced across tissues, e.g. Rnf43 and/or Znrf3 and/or Sp5?E.g. to include in FIg6E?

      __Our response: __

      In the original manuscript (lines 339-342), we have performed the GSVA analysis comparing the KEGG database, and the significantly altered pathways comparing different mammary mesenchymes with salivary gland mesenchyme have been pooled and displayed as heatmap in Supplementary Fig 4b. The WNT signaling pathway is lower in the mammary mesenchyme, especially at E16.5.

      As suggested by the reviewer, we have analyzed Axin2, the most commonly used readout of WNT/CTNNB1 signaling activity in our RNA-seq data that we include as a __new Supplemental Fig. 4c __in the preliminary revised manuscript. Axin2 data indicate that Wnt/β-catenin signaling activity is lower in the E16.5 fat pad, where branching takes place, compared to younger stages of mammary gland and the salivary gland.

      Plan for the final revision:

      Additionally, we will provide expression data of a transgenic Wnt reporter from the same developmental stages and tissues that were used to generate the RNA-seq data.

      Re: (vii) and (viii): The authors convincingly show the phenotype of the Igfr1 KO mice, but I hope the authors concur that an epithelial only Igfr1 KO (or alternatively a mesenchymal only Igf1 KO, or epithelial/mesenchymal recombination experiments with WT vs IGFR1 null or IGF1 null tissue, or experiments with small molecule inhibitors of IGF1/IGFR1 signaling) would have given more solid mechanistic evidence regarding the presumed paracrine effect of IGF1 signaling. I am not asking the authors to perform another mouse experiment or even generate or use these conditional strains, but if the authors agree, then I do think this would merit some attention in the discussion section. See also my comments regarding Eda in point 1.

      Our response:

      As shown in the current manuscript, Igf1 is expressed in the mammary and salivary gland mesenchyme. This finding is in line with E14 in situ expression data available in Genepaint (https://gp3.mpg.de/results/Igf1) showing that overall in embryonic tissues, Igf1 is mainly produced in mesenchymal tissues. Of note, in Genepaint, a clear signal can be detected in the salivary gland mesenchyme, not the epithelium. Published E16 and E18 datasets indicate low level of Igf1 expression in the mammary epithelium (https://wahl-lab-salk.shinyapps.io/Mammary_snATAC/). Hence, we conclude that Igf1 is mainly produced by mesenchymal cells. Instead, Igf1r appears to be rather ubiquitously expressed.

      A previous study assessed BrdU incorporation in Igf1r-/- mammary buds at E14.5, and reported a specific proliferation defect in the epithelium, while no difference was detected in the mesenchyme (Fig. 9, Heckman et al., 2007; PMID:17662267). However, we cannot exclude the possibility of autocrine, mesenchymal Igf1/Igf1r signaling, which in turn could lead to upregulation of a paracrine factor to regulate epithelial growth.

      We agree with the reviewer in that novel conditional mouse models are beyond the scope of the current study. However, we do not think that small molecule drugs could be used to block Igf1r activity in a tissue-specific manner neither.

      Plan for the final revision:

      To further delineate the paracrine and/or autocrine role of Igf1/Igf1r pathway during mammary epithelial growth and branching, we will perform tissue recombination experiments between Igf1r-/- and control mammary epithelium and mesenchyme, as suggested by the reviewer.

      Minor comments: - A few minor spelling/grammar errors, including a couple of "the"s missing (first line of the abstract, and also preceding "Majority" in line 148.

      Our response:

      We apologize for these slips. They have been corrected in the revised manuscript.

      • Line 517-518: please also include the details for the Eda mice.

      Our response:

      We apologize for missing this important information in materials and methods. We have included a short introduction of the K14-Eda mice, a new reference for the original publication producing them, as well as the Jackson Laboratories strain number for Eda-/- (a.k.a. Tabby) mice in the revised manuscript.

      • 1f spelling error: separation

      Our response:

      The spelling error has been corrected in the revised manuscript.

      **Referees cross-commenting**

      Having read all three review reports I think they are pretty much in agreement, with shared questions about the inclusion/meaning/discussion of the lineage specification data and also agreement about the overall technical solidity of the data and this approach.

      I gather that reviewer #2 asks for more controls than myself or reviewer #3 and while I think all of their points are valid, in principle, I don't think all of these are required. I should add that I am inclined to trust the authors on their ability to separate mesenchyme and epithelium as they have been developing and optimising this system over many years.

      Our response:

      We are grateful to the reviewer for the reliance on the technical aspect of our experiments. We do routinely monitor tissue purity in the recombinants (for more details, see our response to reviewer #2). To demonstrate this, we have included new data in new Supplementary Fig. 1a,b and new Supplementary Fig. 3. We believe these additions will further enhance the validity of our findings and effectively address the concerns raised by reviewer 2.

      Reviewer #1 (Significance (Required)):

      General assessment: This is a carefully executed study in which an impressive amount of (combinatorial) embryonic mammary tissue explant experiments are combined with quantitative imaging and transcriptomics analysis.

      The main limitations of the work lie in the fact that the investigation of a potential link between branching and the cell cycle is not entirely novel, as the authors themselves recently published an nice pre-print (now also out in JCB) describing similar analyses. In addition, the mechanistic link between WNT/CTNNB1 signaling in the mesenchyme and the paracrine signaling activities of the presumed downstream effectors EDA and IGF, while plausible, is not yet complete. The work also does not yet addresses what exactly the branching identity is that is bestowed upon the mammary epithelium between E13.5 and E16.5 and how this then becomes an intrinsic (epigenetic?) feature of the mammary gland.

      Advance: This work provides more insight into the embryonic branching of the mammary gland - a stage of mammary gland development that is still poorly understood and that is, in general, understudied. In part, the work confirms prior work in the literature (their REF #19) regarding mammary and salivary gland tissue recombination experiments. It supplements this with a more elaborate time series of heterochronic and heterologous epithelium/mesenchyme explant cultures, using genetically engineered (and fluorescently labeled) mouse tissues to allow better and quantitative imaging. The transcriptomic analysis of different mesenchyme populations is also informative and allows the researchers to propose a putative mechanism for why the mammary gland branches differently from the salivary gland. The advance is both technical and functional, as well as conceptual, with some advance in terms of mechanism.

      Audience: This works should appeal to mammary gland biologists interested in the molecular and cellular mechanisms of (early) mammary gland development, as well as to a broader community of developmental biologists studying branching morphogenesis in tissues such as lung, kidney and salivary gland.

      My expertise: WNT signaling and mammary gland biology, at the intersection of developmental, stem cell and cancer biology

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      The mammary gland is a branched structure that consists of a bilayered epithelium embedded in a specialized mesenchyme. In mice, at 11,5 days of embryogenesis, the ectoderm thickens forming 5 pairs of peculiar structures called placodes. During the following days, the placodes will grow and invaginate into the surrounding mammary mesenchyme and they will finally start to branch by the end of embryogenesis (E16). It has been suggested that the bidirectional communication between the growing mammary gland and the surrounding mesenchyme plays a pivotal role in the determination of each step of mammary gland development (placode formation, mammary bud invagination, gland outgrowth, branching). The role of different signalling molecules has already been shown, particularly for the placode growth and mammary bud invagination. Nevertheless, the pathways regulating embryonic mammary gland branching are still incompletely understood. In this manuscript, Lan and colleagues aim to decipher the correlation between different stages of mammary gland development such as proliferation, lineage segregation and ductal branching. Furthermore, they want to define which stage of mammary development is intrinsically determined by the epithelium and which one requires the supportive guidance of the mesenchyme. Lastly, they aim to discover the key signal for the growth and branching of mammary epithelium. To these purposes, they used an ex vivo model of heterochronic epithelial-mesenchymal recombination. In particular, they micro-dissected the epithelium and/or the mesenchyme from murine mammary glands at different stages of embryonic development (i.e. at E13,5 for the quiescent phase or 16,5 for branching phase) and explanted them together in different combinations using fluorescent reporters. To assess the role of the mesenchyme they also cultured the epithelium in a mesenchyme free 3d structure. Through this model they demonstrated that the presence of the mesenchyme is necessary for the priming of mammary epithelium for branching, since only E16,5 epithelial cells were able to grow and branch in a mesenchyme free 3D experiment. Nevertheless, intrinsic properties of the epithelium are necessary for the timing of branching, since E16,5 mesenchyme was not able to accelerate the outgrowth of E13,5 epithelia. In order to determine which epithelial properties are important, the authors correlated the beginning of cell proliferation in the embryonic mammary gland to the beginning of the branching phase. They indeed used the Fucci2a mouse model to carefully characterise the timing of mammary cells proliferation at different stages of embryonic development, concluding that the great majority of proliferating cells reside in the inner part of the mammary bud until E14,5, while in the external part at later stages. Regarding the importance of cell proliferation, Lan and colleagues claim that the beginning of the branching phase is not its direct consequence, thanks to the use of the K14Cre- Eda mouse model, known to have anticipated mammary gland development. Using this and the Eda-/- models, the authors also sustain that the branching occurs independently of the lineage specification of the epithelium. The use of salivary mesenchyme instead the mammary one was able to increase the number of branching of E16,5 mammary epithelium. Nevertheless, this model demonstrated that the branching pattern (side branching vs tip bifurcation) is an intrinsic feature of the epithelium. Lan and colleagues also defined the transcriptomic profiles of the mammary and salivary mesenchymes at different stages. In particular, they observed an increased expression of negative regulators of Wnt pathway in the mammary mesenchyme compared to the salivary mesenchyme. Moreover, using a mouse model where B-catenin is stabilised, they observed increased tip production in the mammary gland epithelium. They also showed that IGF1 production is increased after Wnt pathway activation and they tested its function, both treating their ex vivo cultures with exogenous IGF1 and using Igf1r-/- mouse models.

      Major comments 1- The great majority of the results of the manuscript are based on an ex vivo model of heterochronic epithelial-mesenchymal recombination. Since the authors are studying the effect of the mesenchyme of different stages on the epithelium (and vice versa), the purity of the two compartments after the dissection is particularly important. Although they said that the purity is evaluated (line 112), it would be important to show a control staining in which they use known markers of the mesenchyme with no colocalization with the fluorescent reporter of the epithelium.

      Our response:

      We agree with the reviewer that the purity of the separated tissues is very important for our conclusions. This is why we have used genetically labeled tissues in all recombination experiments: the epithelium and the mesenchyme were always isolated from embryos ubiquitously expressing GFP or tdTomato. We find this the most reliable way to assess the origin and purity of the isolated tissues. If there was any carry-over mesenchyme isolated with the GFP+ epithelium, this would be revealed as GFP+ mesenchymal cells in the recombinants consisting of otherwise tdTomato+ mesenchyme. And vice versa: any carry-over tdTomato+ epithelium isolated with the mesenchyme would be revealed as tdTomato+ epithelial cells in the recombinants. We apologize for not making this clear enough in the original manuscript. In the revised manuscript, we now provide confocal high-resolution images of the recombinant explants (new Supplementary Fig. 1a,b). The explants have been co-stained with the epithelial marker EpCAM, revealing a robust colocalization between the ubiquitously expressed florescent labels in the designated epithelial tissues and the EpCAM.

      2- Another important point for understanding the quality and impact of these findings is to assess the similarities and differences, if there are, between the in vivo mesenchyme and the ex vivo one. Indeed, once explanted and put in culture, mesenchymal cells could change their transcriptomic profile and consequently change their signals to the epithelium. The authors should assess the expression of the genes and pathways studied during embryonic development in vivo.

      Our response:

      The reviewer is correct in that the transcriptomes will likely undergo some changes when organs are cultured ex vivo. This is why RNA-seq was done on freshly isolated tissues. Regarding the potential changes taking place ex vivo, however, we do not consider them relevant with respect to the questions we are addressing in this study. The reason is (as reported in the manuscript) that all control recombinations (homochronic recombinations such as E13 epithelium + E13 mesenchyme, E16 epithelium + E16 mesenchyme etc.) branched essentially as in vivo. Therefore, we find the results and conclusions made from the tissue recombination experiments solid.

      3- The authors clearly showed that E16,5 epithelium is able to branch in a mesenchyme free 3D culture model, while epithelia from earlier stages don't. This led to the conclusion that mesenchyme is necessary for acquiring the branching ability. Nevertheless, the authors also said that early stages epithelia scarcely grow in the mesenchyme free 3D culture. Therefore, the lack of branching may be due to the lack of growth, if not the increase of death, of epithelial cells. The authors should quantify the size and the cell death of the epithelia in the different culture conditions and discuss better this point.

      Our response:

      The reviewer is correct in that one of the key functions of the mammary mesenchyme up to E16.5 may be to provide survival signals for the epithelium, and this might explain why epithelia younger than E16.5 fail to grow/branch when recombined with salivary gland mesenchyme and in mesenchyme-free organoid culture.

      Plan for the final revision:

      To address this issue, we will assess apoptosis in mammary epithelia cultured in the mesenchyme-free 3D culture organoid set-up.

      4- The Fucci2a model allowed to assess the proliferation of embryonic mammary epithelium, showing that the great majority of proliferating cells are basal, at late stages of development (line 182). As it has already been shown, lineage specification is a late process during mammary gland development. The fact that the proliferating cells reside at the external part of the bud does not mean that they are basal cells yet. A p63/K8 staining could be important to understand if the increased proliferation occured in already specified basal cells or not.

      __Our response: __

      Indeed, mammary lineage specification is a later process. As pointed out in the manuscript and by reviewer #1, the widely used basal and luminal lineage markers have not yet segregated to separate compartments at the developmental stages analyzed in our study, and therefore cannot be used as tools for this purpose. We would like to emphasize that in the manuscript, we analyzed the cells based on their position, and have used the term basal to indicate the basal position, not the prospective lineage. Accordingly, we used the term inner instead of luminal cells to indicate their location, not lineage. We have further clarified this point in the preliminary revised manuscript.

      5- The use of Fucci2a model showed that 20% of epithelial cells are proliferative at E13,5. This phase is considered as "quiescent" by the authors (line 120), but the moderate proliferation rate shown in this experiment demonstrated that it is not. A change of the nomenclature is needed.

      __Our response: __

      We have removed the word “quiescent” from the text.

      6- Through the use of K14-Eda and Eda-/- models, the authors claimed that the lineage specification is not a prerequisite for ductal branching. To support this point, they showed that the K14-Eda mice have an anticipated branching although the expression of K8 in the inner part of the bud is transitorily decreased. The authors link the K8 downregulation to a transient suppression of the luminal lineage, but this is clearly overclaimed. Although K8 is a known marker of luminal lineage, the downregulation of one marker is not sufficient to support their thesis. They should first check more markers and in particular critical regulators of luminal lineage as Notch1, Foxa1 and Elf5. Lately, the use of different models that drive embryonic epithelial cells to a forced lineage commitment (Notch1 or Δnp63 overexpression) would support more their claim. As additional evidence, the authors showed that Eda is able to promote basal cell signature. Firstly, the authors should better explain why this point would support their thesis. Secondly, the supplementary figure 2b does not show which genes are taken into account to define the basal signature. A list of these genes would be helpful, as well as staining for some representative proteins.

      Our response:

      We thank the reviewer for these constructive suggestions. We agree with all reviewers in that this part of the manuscript was not mature enough and provided only indirect evidence on the potential link between lineage segregation and branching ability. This is an important question in the field that merits a study of its own to be addressed with better tools than those available to us at present. As suggested by reviewers #1 and #3, we have omitted this part in the revised manuscript.

      7- The authors used the same mouse models to assess the importance of proliferation in the determination of ductal branching and they claimed that proliferation is not a sufficient feature. This conclusion was supported by two observations. The first one is the fact that the K14-Eda model shows an increased cell proliferation at early stages compared to wt, coupled with anticipated branching. Secondly, although having smaller glands compared to wt and showing a delay in ductal branching, Eda-/- mice have an epithelial proliferation rate very similar to wt. Again, the conclusion that proliferation is not sufficient for branching is overclaimed. Firstly, the authors should explain how the buds in wt and Eda-/- mice have different sizes although the similar proliferation (increased cell death?, cellular volume?). Secondly, to support the thesis that proliferation is not sufficient for branching, functional experiments should be performed (see point 12). For instance, the short-time treatments with inhibitors or promotors of proliferation may help to understand the effective role of proliferation in the determination of branching.

      Our response:

      We show that there is no direct link between onset of proliferation and acquisition of branching ability. However, we are not claiming that proliferation is not important for branching, as obviously new cells are needed as building blocks of growing tissues. In a recently published paper, we have assessed the role of proliferation in branch point formation in embryonic mammary glands. Using mitomycin C to block proliferation, we showed that initiation of new branches occurs even when proliferation is blocked (Myllymäki et al., JCB2023, PMID: 37367826).

      The reviewer was also asking why Eda-/- mammary primordia are smaller at E15.5-E16.5 despite similar proliferation rates. In the revised manuscript, we have quantified the volume of E13.5 Eda-/- and control mammary buds and show that Eda-/- buds are ~25% smaller (3.5 ± 0.8 x 105 µm3 in Eda-/- vs. 4.6 ± 0.7 x 105 µm3 in control, mean ± SD) already at the bud stage (new Supplementary Fig. 2c,d).

      We have also quantified the cellular size in Eda-/- and control mammary glands at E13.5 and E15.5 and found that mammary epithelial cells in Eda-/- embryos are ~15% smaller (new Supplementary Fig. 2e,f). Together, these data indicate that the smaller size of E15.5-E16.5 Eda-/- mammary glands is a combinatorial effect the smaller mammary anlage at E13.5 and smaller cell size. These findings, while interesting on their own, do not challenge our conclusions regarding the link between onset of proliferation and acquisition of branching ability.

      8- The heterotypic epithelial-mesenchymal recombination using the salivary gland is interesting. Nevertheless, some stainings to assess the purity of their systems are again required (e.g., marker of salivary epithelium to verify the purity of the mesenchyme and vice versa).

      __Our response: __

      As mentioned above, all tissue recombination experiments were performed so that the epithelium and the mesenchyme originated from genetically labelled embryos expressing different fluorescent proteins. In the revised manuscript, we provide confocal images of the salivary-mammary tissue recombinants (new Supplementary Fig. 3), confirming the purity of the tissue compartments used in these experiments.

      This model clearly showed that the mammary epithelium can form more branching when combined with the salivary mesenchyme. Moreover, the salivary epithelium preferentially branches through tip bifurcation, while mammary epithelium combined with the salivary mesenchyme has a mixed pattern of tip bifurcation and side branching (typical of the mammary gland). The authors thus concluded that the branching pattern is an intrinsic feature of the epithelium. However, a comparison between the percentage of tip bifurcation and side branching in the heterotypic combination and the homotypic combination between mammary epithelium and mammary mesenchyme is crucial to understand this point. Indeed, these results are not sufficient to exclude that the branching pattern is partially determined by intrinsic features and partially by extrinsic signals. The authors should carefully quantify the branching pattern in the homotypic combination and compare that to the heterotypic one. If the percentage of tip bifurcation do not change, their conclusion is correct; if this percentage increases in the heterotypic combination, it would be a sign of a partial effect of the signals of the mesenchyme.

      Our response:

      We thank the reviewer for raising this question. We have independently generated data on the type of mammary gland branching events in two papers with somewhat different culture and imaging conditions (Lindström et al., BiorXiv 2022 and Myllymäki et al., JCB, 2023, PMID: 37367826). Both analyses showed that in embryonic mammary glands, the majority of branching events (~70%) occurs by side-branching. These data are in line with the current study that we have now complemented to include also the mammary-mammary recombination experiments (revised Supplementary Video 1, revised Fig. 4b). Quantification of branching events revealed no significant difference in the type of branching events of mammary epithelia grown with salivary or mammary gland mesenchyme (revised Fig. 4c), further supporting our initial conclusions.

      9- Through the analysis of their transcriptomic data, Lan and colleagues found that the mammary mesenchyme expresses higher levels of negative regulators of Wnt pathway compared to the salivary mesenchyme. To demonstrate the value of their findings, they should confirm this in vivo, through staining of known Wnt proteins on the salivary and mammary mesenchymes at the embryonic stage.

      Our response:

      In mammals, there are 19 Wnt ligands, over a dozen secreted Wnt inhibitors, 10 Frizzled receptors, two Lrp co-receptors, and numerous other pathway modifiers that contribute to the net Wnt signaling activity in a complex manner. Furthermore, it has been “notoriously difficult to generate useful antibodies to vertebrate Wnt proteins...In general, these sera do not detect endogenous Wnt proteins in cell extracts, nor do they detect Wnt proteins in tissues by staining techniques. Hence, there are few data on Wnt protein distribution in intact vertebrate animals.” This is a direct citation from the Wnt Homepage, maintained by the Nusse Lab; https://web.stanford.edu/group/nusselab/cgi-bin/wnt/reagents#antibod.

      For all these reasons, we do not find this approach feasible nor informative.

      Instead, in the revised manuscript, we report the expression levels of Axin2, the most commonly used transcriptional readout of canonical Wnt activity in our RNA-seq data (new Supplementary Fig. 4c). Axin2 levels are lowest in the E16 fat pad where mammary branching takes place, much lower than in any other tissues analyzed in the study.

      Plan for the final revision:

      To complement these findings, we will additionally provide expression data of a transgenic Wnt reporter from the same developmental stages and tissues that were used to generate the RNA-seq data.

      10- Since the ability of the salivary mesenchyme to promote a higher rate of branching in the mammary epithelium, the authors wanted to assess what could be the role of Wnt signalling. To do so, they used a mouse model where B-catenin is stabilised, allowing an increased Wnt signalling in the mammary mesenchyme. As a result, they observed increased branching in the mammary epithelium. They also found that IGF1 is a ligand regulated by Wnt pathway in the mesenchyme. Therefore, the use of exogenous IGF1 in their ex vivo model was able to increase the branching of the mammary epithelium. Moreover, Igf1r-/- embryos showed a significant decrease of mammary gland branching. The conclusion based on these experiments was that the Wnt-Igf1-Igf1r axis plays a pivotal role in the promotion of mammary gland branching during embryogenesis. This conclusion is overclaimed for different reasons. Firstly, the normalization of the ductal branching to the body weight is insufficient to exclude that the impact of the Igf1r knockout may have severe consequences on the mammary gland formation, upstream of the ductal branching. Another parameter for this normalization is required (e.g., size of the bud before branching, proliferation status, etc).

      Our response:

      We agree with the reviewer in that Igf1r knockout may affect mammary gland formation in multiple ways, and also prior to onset of branching, as already indicated in the original manuscript: “…apart from one study reporting the smaller size of the E14 mammary bud in IGF-1R deficient embryos …” (line 398-399 in the revised version) and ‘…mammary gland 3 that was consistently absent.’ (line 414-415 in the revised version).

      To assess whether the reduced size and branching of E16.5/E18.5 Igf1r-/- mammary glands is merely a consequence of the smaller anlage, the revised manuscript includes new data reporting quantification of the volume of mammary gland 2 of Igf1r-/- and wild type littermate embryos at E13.5, E16.5, and E18.5 from 3D confocal images of whole mount EpCAM stained mammary glands. As can be seen from the new Fig. 7g-h, at E13.5, the mutant mammary buds are about 60% of the size of the controls, at E16.5, 25% and at E18.5 only 20 % revealing a progressive defect, indicative of a specific defect at the outgrowth and branching stage. This conclusion was validated by normalization to the body weight: at E13.5 the size of Igf1r-/- mammary anlage did not differ from that of the wild type embryos (p = 0.11), at E16.5 the sprouts were smaller in the mutants, though the difference did not reach statistical significance (p = 0.08), while at E18.5, the Igf1r-/- mammary glands were significantly smaller (p = 0.000021) (new Fig. 7i). We find these data compelling evidence for a specific role for Igf1r in outgrowth and branching of the embryonic mammary gland.

      The use of alternative models to specifically knockout the receptor in the epithelium or the ligand in the mesenchyme (e.g. viruses) would be even more useful to specifically focus on the role of this pathway for ductal branching excluding side effects.

      Our response:

      We thank the reviewer for this suggestion. Unfortunately, based on our experience, viral shRNA delivery is not sufficiently efficient for effective gene silencing, unlike Cre delivery for a gain-of-function approach (used in the current study to flox out exon 3 of beta-catenin) in case where the endogenous pathway activity is very low and therefore, targeting even a subset of cells is sufficient for upregulation of paracrine factors.

      Plan for the final revision:

      To address the question on the autocrine or paracrine role of Igf1r, we will perform tissue recombination experiments between Igf1r-/- and control mammary epithelium and mesenchyme.

      Another limit of this model is the fact that Igfr1 can be bound by Igf2 as well and we cannot exclude that this has an impact too (except if Igf2 is not expressed at this stage). A quantification of Igf2 expression may be useful.

      Our response:

      Indeed, we cannot exclude the possibility that Igf2 could also play a role (Igf2 expression was similar to Igf1 in our RNA-seq dataset, see Supplementary Fig. 5), but the connection of mesenchymal Wnt signaling activity was to Igf1, not Igf2 – in fact Igf2 was somewhat downregulated in Wnt3A treated sample reported by Wang et al. (Wang et al., 2021) (highlighted by an arrow in the revised Fig. 6). We have also clarified this point in the Discussion of the preliminary revised manuscript.

      11- From the experiments presented in this section it is clear that Wnt-Igf1-Igf1r axis has to be finely regulated to have the correct amount of ductal branching in the embryonic mammary epithelium. Nevertheless, the author just showed the RNA levels of Igf1 in the different compartments they have analysed. Stainings to see the effective presence of the ligand on the tissue is mandatory to clarify the role of this axis in the ductal branching in vivo.

      Our response:

      Igf1-Igf1r signaling plays a critical growth promoting function during embryonic and postnatal development. The expression of Igf1 at RNA and protein level has been detected in almost all tissues in humans (Daughaday et al., Endocr. Rev., 1989; PMID: 2666112). Given that Igf1 is a secreted protein and multiple Igf binding proteins (Igfpbs) (that regulate the bioactivity of Igf1 by sequestering it) are expressed in the mammary and salivary gland mesenchyme (Supplementary Fig. 5), we find it unlikely that Igf1 staining would provide any additional information to the current study, as they cannot be used to assess the source of Igf1, nor the location of the signaling activity.

      Furthermore, as underlined by the authors, this axis is specifically important and upregulated in the salivary gland. Due the limit of the Igf1R-/- model, we cannot exclude that, although Wnt-Igf1-Igf1r axis is able to increase the branching ability of mammary epithelium, the normal branching rate observed in wt mice is due to other pathways.

      Our response:

      We agree with the reviewer in that other pathways are also important in regulating normal mammary gland branching, for example, Eda/NF-κB and FGF pathways as we described in the Introduction. Our results do not exclude the possibility that also pathways other than Wnt regulate Igf1 expression. The reviewer is correct that if a paracrine factor is expressed in the salivary gland but not in the mammary mesenchyme, its physiological effect may be limited to the salivary gland. Indeed, cluster 5 identified by the mFuzzy analysis (Fig. 5f) is likely to include some genes like that. This is why we decided to focus on cluster 6 genes like Igf1. In the revised manuscript, we have better highlighted the difference between cluster 5 and 6 genes.

      Unfortunately, with the currently available tools, we cannot test the importance of the endogenous mesenchymal Wnt signaling activity by inactivating Wnt signaling activity specifically in the mesenchyme at the time point when branching begins. This would require an inducible mesenchymal Cre line (mesenchymal β-catenin is essential for the early fate specification of the primary mammary mesenchyme; Hiremath et al., 2012, PMID: 23034629), and conditional β-catenin null mouse. We do not have such mice available and we find that these experiments are beyond the scope of the current study.

      12- Lastly, once claimed to have found the key factor necessary for ductal branching promotion, the authors should also test if the proliferation and lineage segregations are unaffected in this context, confirming their dispensable role claimed in the initial part of the manuscript.

      __Our response: __

      Igf1/Igf1r is well-known for its growth promoting function via cell proliferation. We have no reasons to think that this would not be the case also in the mammary gland, and it was not our intention to give the impression that proliferation was not affected. In fact, Hiremath et al. (2012) already reported a defect in epithelial cell proliferation in Igf1rmammary buds at E14. Our key finding is that compared to other organs, the mammary gland is particularly sensitive to loss of Igf1r during branching morphogenesis. Finally, as pointed out earlier, better tools will be needed to assess the potential link between lineage segregation and onset of branching, a topic that we hope to address in the future.

      Minor comments: 1- An important paper on mammary gland ductal branching was published on Nature in 2017 by Scheele and colleagues and should be presented in the introduction, even though it is at later stages (after birth).

      Our response:

      We thank the reviewer for the suggestion. In the revised manuscript, we have added the findings from Scheele et al. 2017 in the introduction.

      2- In line 136 and 139 the authors referred to Fig 2 but it should be Fig 1

      Our response:

      We apologize for these slips. They have been corrected in the revised manuscript.

      3- The sentence on line 142 should be rephrased, since "advanced developmental stages" may be referred to pubertal development. The authors should specify that they are talking about embryonic development.

      Our response: We apologize for the potential misunderstanding. In the revised manuscript, we have used the phrase “advanced embryonic developmental stage” to describe our conclusion more precisely.

      Reviewer #2 (Significance (Required)):

      Overall, the authors concluded that embryonic mammary gland development and branching are extremely sensitive to the loss of IGF1, normally produced by the mesenchyme. The topic of the paper is interesting, the experimental approaches are well conceived, the data are convincing and the findings are of interest to developmental biologists. Nevertheless, there are some significant points that need to be further investigated before considering the manuscript suitable for publication:

      Our response:

      We thank the reviewer for the careful assessment and positive feedback of our manuscript. We have already addressed most of the points raised and most remaining ones will be addressed in the final revised manuscript.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      Here the authors use classical embryonic tissue recombination and pharmacological manipulation of explants in conjunction with cutting edge 3D imaging of tissue derived from highly sophisticated reporter and knock-out mouse models and state of the art transcriptomic analysis to masterfully delineate and dissect regulatory pathways critical for embryonic mammary development. Specifically, they set out to parse regulation of proliferation from that of branch patterning.

      While it has long been established that epithelial-mesenchymal interaction is necessary for mammary branching this work shows by heterochronic recombination that initiation mammary branching is not advanced by mesenchymal stage. By examining Fucci2a embryos the authors demonstrate that branching is preceded by a significant increase in basal cell-biased proliferation but, through further analysis of Eda gain and loss of function mice, conclude that proliferation per se does not cause branching. They show by heterotypic recombination with salivary tissue that early mammary epithelia rudiments require their own mesenchyme for survival and that although later E16.5 rudiments expand more robustly when in contact with salivary mesenchyme they nevertheless retain their characteristic mammary branch pattern. Thus, they establish that initiation and patterning are intrinsic properties of the epithelium but that early survival and later expansion/proliferation is regulated by the mesenchymal context. By transcriptomic comparison of mammary and salivary mesenchyme they reveal that genes encoding canonical Wnt attenuators and antagonists are highly expressed in early mammary mesenchyme and drop as branching ensues. The low expression of these negative regulators of Wnt signaling in salivary mesenchyme is proposed as an explanation for its growth and branch stimulating capability. In keeping with these observations, the authors show that experimental activation of mammary mesenchymal Wnt signaling augments both growth and branching. Lastly, they identify transcriptomic changes in IGF1 coincident with the initiation of mammary branching and confirm its role by extending analyses of the effects of gain and loss of function of IGF1 on embryonic mammary development.

      This is a thorough, well-constructed paper that adds new knowledge and important conceptual nuance and mechanistic insight to classical findings on branch patterning. This work is a technical tour de force and backed by solid quantitative and statistical analysis throughout. Their experimental approach is superb and the conclusions are sound. Their findings will be of great interest to the community of mammary gland biologists and to the wider field of embryologists focused on early development of a broad range of ectodermal appendages.

      I have some minor criticisms that I believe can be quickly remedied in a minor rewrite and suggestions for the authors consideration to improve the manuscript discussion as follows:

      Minor issues Abstract, line 37: The authors misuse the word "decompose" - it should be "deconstruct"

      __Our response: __ We thank the reviewer for pointing out our mistake, which we have corrected in the revised manuscript.

      Results, p7 line 48: Add "The" to the sentence: "The majority...."

      __Our response: __ Corrected it in the revised manuscript.

      P8 line 173 This sentence refers to Figure 2G which is a quantitative plot. I would suggest replacing the word "cluster" which implies a spatial organization with the word "subset" or "significant fraction" The spatial data in Fig 2d support basal bias but do NOT to my eye show any clustering - in fact the proliferative basal cells appear to be evenly dispersed within the basal layer.

      Our response:

      We thank the reviewer for highlighting this aspect. We agree that “significant fraction” is a more suitable term than “cluster”.

      P9 line 188: The statement on basal cell lineage specification needs a reference.

      __Our response: __

      Following the suggestions from reviewers #1 and 3, we have removed the content about lineage segregation in Results, together with this sentence.

      P10 line 201-216 I found the section on lineage specification (fig S2) weaker than the rest and a distraction from the main thrust of the paper making it difficult for the reader to focus. I suggest omitting this section and supplemental figures associated with it altogether.

      __Our response: __

      We agree with all reviewers in that this part of the manuscript was not mature enough and provided only indirect evidence on the potential link between lineage segregation and branching ability. This is an important question in the field that merits a study of its own that should be addressed with better tools than those available to us at present. As suggested by reviewers #1 and #3, we have omitted this part in the revised manuscript.

      P9 line 190: "displays precocious onset of branching" it is sufficient to say: displays precocious branching - the use of both "precocious" and "onset' is redundant.

      P10 line 229 Similarly, delete "the onset of branching was delayed" it is sufficient to say: branching was delayed.

      __Our response: __ Both sentences have been corrected it in the revised manuscript.

      P11 line 243: Delete "on the regulation of the" and substitute the word "to" in the sentence: "Next, we shifted our focus on the regulation of the branching pattern, which is thought to be determined by mesenchymal cues."

      __Our response: __ Corrected it in the revised manuscript.

      P11 line 241 subtitle and Figure 4 title: The disparity in titles here is jarring for the reader: Results text subtitle: "Salivary gland mesenchyme is rich in growth-promoting cues, but does not alter the mode of branch point formation of the mammary epithelium". Figure 4 Title: "Mammary mesenchyme is indispensable for the branching ability of the mammary gland". I suggest to the authors divide the figure as well as the text to make the two points indicated by their disparate titles separately.

      __Our response: __ We thank reviewer for the suggestion to clarify the Results part of the manuscript. As suggested, we have split the data under two separate subtitles, but due to limitations in figure numbers, we prefer to report these data in one figure panel.

      P12 line 279 From here on out the manuscript has a tendency to use the term "growth" ambiguously - in many instances it is unclear do they mean expansion, proliferation, increased branch number/ morphology?? Please try to clarify.

      __Our response: __

      Our aim is to use the term growth to mean tissue growth (expansion). We hope that this is clearer in the revised manuscript.

      P16 line 341 use word "prompted" instead of word "promoted"

      __Our response: __ We thank reviewer for spotting out the slip, which we have corrected in the revised manuscript.

      P16 line 382: include word "embryonic" before "mammary development"

      __Our response: __ We have modified the text in the revised manuscript.

      Discussion P18 line 416: Add the words "later stage (E16.5)" to the sentence: "Importantly, we demonstrate that salivary gland mesenchyme could only promote the growth of later stage (E16.5) mammary epithelium"

      __Our response: __ We thank reviewer for the suggestion. We have modified the text in the revised manuscript.

      P19 line 437: Given the authors statement "Instead, cell motility is critical for branch point formation in the mammary gland" they should consider a brief sentence mentioning their transcriptomic findings on cadherin 11 and Tenascin.

      __Our response: __ We thank the reviewer for appreciation of our transcriptomic data. In the revised manuscript, we have added the following text in discussion: “Accordingly, we observed significantly increased expression of cell migration promoting genes such as Cdh11 (encoding Cadherin 11), and Tnc (encoding Tenascin C) 60,61 in the E16.5 mesenchyme compared to E13.5 (Supplementary Table 2).”

      P19 line 451: Similarly, given their statement "This observation suggests that mammary epithelium itself carries the instructions dictating the mode of branching" they could consider their transcriptomic data on Ltbp1 in "mammary specific" clusters 7,8,9 as a matrix molecule initially expressed by mammary mesenchyme but which becomes expressed by luminal epithelial cells at precisely the time they acquire lineage specification and intrinsic branching capability.

      __Our response: __ This is an excellent suggestion. We have added following text in discussion: “It is worth noting that certain mesenchymal factors, such as Ltbp1, began transitioning towards epithelium-specific expression around E16.5 69. Exploring the potential impact of these factors on the self-instructed branching capacity of the mammary epithelium could yield valuable insights.”

      P20 lines 462-470 The authors should address their theory of Wnt suppression in the mammary mesenchyme in the context, albeit conflictingly, of earlier studies showing expression of Wnt signaling reporters, in either epithelial or mesenchymal locations during early stages.

      Our response: __ We thank reviewer for the suggestion. In the preliminary revised manuscript, we report Axin2 expression data as __new Supplementary Fig. 4c. Axin2 expression data suggest that Wnt/β-catenin activity is lowest in the E16.5 fat pad (where branching takes place) compared to all other tissues analyzed in the study.

      Plan for the final revision:

      For the final revised manuscript, we will additionally generate transgenic Wnt reporter expression data (see also our response to point 3 of Reviewer #1). These results will be discussed in light of the published Wnt reported literature in the final revised manuscript.

      Reviewer #3 (Significance (Required)):

      Here the authors use classical embryonic tissue recombination and pharmacological manipulation of explants in conjunction with cutting edge 3D imaging of tissue derived from highly sophisticated reporter and knock-out mouse models and state of the art transcriptomic analysis to masterfully delineate and dissect regulatory pathways critical for embryonic mammary development. Specifically, they set out to parse regulation of proliferation from that of branch patterning.

      This is a thorough, well-constructed paper that adds new knowledge and important conceptual nuance and mechanistic insight to classical findings on branch patterning. This work is a technical tour de force and backed by solid quantitative and statistical analysis throughout. Their experimental approach is superb and the conclusions are sound. Their findings will be of great interest to the community of mammary gland biologists and to the wider field of embryologists focused on early development of a broad range of ectodermal appendages.

      Our response:

      We much appreciate the positive evaluation of our manuscript. We have addressed all the feedback provided by the reviewer 3 in the preliminary revised manuscript, except the last point, which will be included in the final revision along with the new data on the Wnt reporter expression.

      Field of expertise: Embryonic and adult mammary development, Wnt signaling, cell adhesion

    1. Author Response

      eLife assessment

      This useful paper examines changes (or lack thereof) in birds' fear response to humans as a result of COVID-19 lockdowns. The evidence supporting the primary conclusion is currently inadequate, because the model used does not properly account for many potentially confounding factors that could influence the study's outcomes. If the analytic approach were improved, the findings would be of interest to urban ecologists, behavioral biologists and ecologists, and researchers interested in understanding the effects of COVID-19 lockdowns on animals.

      Many thanks for these supportive words. We did our best to improve our manuscript according to the reviewers and editor comments. Importantly, we regret being unclear in the Methods, as our models already controlled for most of the confounds (see below) discussed by the reviewers.

      For example, given that a single observer collected the data at most sites, site as a random intercept in the models controls also for the observer effects (which is one of the reasons why site is in the model). We added details to Methods (L352-356, see also “Statistical analyses” in the main text).

      The first reviewer asked us to use “some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here”. Our main results are now based on country-specific models and hence, the use of a single value predictor for each city is not appropriate. Please, see also below.

      The second reviewer is concerned about multicollinearity in our models because of the 0.95 correlation between Period and Stringency Index. However, these are key predictor variables of interest that have never been used within the same model as predictors. We now clearly explain this in the Methods (L458-538, 548-550) and within legend of Figure S2.

      The third reviewer suggested that our models would benefit from controlling for day in the species-specific breeding cycle. Although we don’t have precise city-specific information on the timing of breeding stages in the sampled populations of birds, we partly control for these effects by including a random intercept of day within each year and species. This random factor explained most of the variance (see Table S1-S2) – something that could have been expected. In other words, we do control for what the third reviewer asked for. Similarly, we account for habitat features that may influence escape distance by including site in the models. Site usually refers to a specific park (we assume that within-park heterogeneity is lower than between park variation) and hence partly addresses the reviewer’s concern. Again, we highlight this within the Methods (L466-476).

      Reviewer #1 (Public Review):

      This paper uses a series of flight initiation "challenges" conducted both prior to and during COVID-19-related restrictions on human movement to estimate the degree to which avian escape responses to humans changed during the "anthropause". This technique is suitable for understanding avian behavioral responses with a high degree of repeatability. The study collects an impressive dataset over multiple years across five cities on two continents. Overall the study finds no effect of lockdown on avian escape distance (the distance at which the "target" individual flees the approaching observer). The study considers the variable of interest as both binary (during lockdown or prior to lockdown) and continuous, using the Oxford Stringency Index (with neither apparently affecting escape distance). Overall this paper presents interesting results which may suggest that behavioral responses to humans are rather inflexible over "short" (~2 year) timespans. The anthropause represents a unique opportunity to disentangle the mechanistic drivers of myriad hypothesized impacts humans have on the behavior, distribution, and abundance of animals. Indeed, this finding would provide important context to the larger body of literature aimed at these ends.

      Thank you very much for your positive feedback.

      However, the paper could do more to carefully fit this finding into the broader literature and, in so doing, be a bit more careful about the conclusions they are able to draw given the study design and the measures used. Taking some of these points (in no particular order):

      Thank you. We did our best in addressing your comments (see below and updated Methods, Results and Discussion sections).

      1) Oxford Stringency Index is a useful measure of governmental responses to the pandemic and it's true that in some scenarios (including the (Geng et al. 2021) study cited by this paper) it can correlate with human mobility. However, it is far from a direct measure of human mobility (even in the Geng study, to my reading, the index only explained a minority of the variation). Moreover, particular sub-components of the index are wholly unrelated to human mobility (e.g. would changes to a country's public information campaign lead to concomitant changes in urban human mobility?). Finally, compliance with government restrictions can vary geographically and over time (i.e. we might expect lower compliance in 2021 than in 2020) and the index is calculated at the scale of entire countries and may not be very reflective of local conditions. Overall this paper could do more to address the potential shortcomings of the Oxford Stringency Index as a measure of human mobility including attempting to validate the effect on human mobility using other datasets (e.g. the google dataset and/or those discussed in (Noi et al. 2022). This is of critical importance since the fundamental logic of the experimental design relies on the assumption that stringency ~ mobility.

      Thank you for this comment. First, Oxford Stringency Index seemed to us as the best available index for our purposes, i.e to estimate people's mobility during the shutdown because restrictions surely influenced the possibility that people would be outside, and because the index is a country-specific estimate. However, in addition, we now checked all indices mentioned in Noi et al. 2022 and found useful only the Google Mobility Reports, which we now use, because (a) it is publicly available, (b) it is available also for territories outside US, and (c) provides data for each city included in our dataset as well as for urban parks where most of our data were collected. Note that some platforms are no longer providing their mobility data (e.g. Apple).

      However, Google Mobility provides day-to-day variation in human mobility, whereas we are interested in overall increase/decrease in human mobility. Nevertheless, we correlated the Google mobility index with the Stringency index and found that human mobility generally decreases with the strength of the anti-pandemic measures adopted in sampled countries (albeit the effect for some countries, e.g. Poland, is small; Fig. 5).

      Moreover, we also added analysis using # of humans collected directly in the field during escape trials (e.g. Fig. 6 and S6) and found that the link between # of humans and Stringency index or Google Mobility was weak and noise, 95%CIs widely crossing zero (Fig. 6).

      Importantly, if we use Google Mobility and # of humans, respectively, as predictors of escape distance, the results are qualitatively very similar to results based on Oxford Stringency Index (Fig. S6), or Period, with tiny effect sizes for both (95%CIs for Google Mobility -0.3 – 0.06, Table S5, for # of humans -0.12 – 0.02, Table S6) supporting our previous conclusions.

      Note that Google Mobility and the number of humans have their limitations (see our comment to the editor and the Methods section in the main manuscript, e.g. L418-433). The lack of Google Mobility data for years before the COVID-19 pandemic does not allow us to fully explore whether overall human activity decreased during COVID-19 or not (our test for period prior and during COVID-19). If the year 2022 reflects a return to “normal” (which is to be disputed due to COVID-19-driven rise in home office use) the 2020 and 2021 had on average lower levels of human activity (Fig. 4). Whether such a difference is biologically meaningful to birds is unclear given the immense day-to-day change in human mobility and presence (Fig 4). Moreover, the number of humans capture within- and between-day variation rather than long-term changes in human presence.

      We added details on the new analysis into the method and results sections (e.g. Fig. 4-6; L142-165, 418-438, 495-535) and Supplementary Information (Figs. S5-S9 and associated Tables) and discuss the problematic accordingly. Moreover, to enhance clarity about country specific effect (or their lack), we also add country specific estimates to the Results (Fig. 1 and Fig. S6 and respective Tables). Finally, our statistical design and random structure of the model allowed us to control for spatial and temporal variation in compliance with government restrictions.

      2) The interpretation of the primary finding (that behavioral responses to humans are inflexible) could use a bit more contextualization within the literature. Specifically, the study offers three potential explanations for the observed invariance in escape response: 1) these behaviors are consistent within individuals and this study provides evidence that there was no population turnover as a result of lockdowns; 2) escape response is linked to other urban adaptations such that to be an urban-dwelling species dictates escape response; and/or 3) these populations already exhibit maximum habituation and the reduction in human mobility would only have increased that habituation but that trait is already at a boundary condition. Some comments on each of these respectively:

      Thank for these comments. We incorporated them in the main text (L293-329). Your point 1) corresponds to our point (i): “Most urban bird species in our sample may be relatively inflexible in their escape responses because the species may be already adapted to human presence” (L293-306); your point 2) to our point (ii): “Urban environment might filter for bold individuals (Carrete and Tella, 2013, 2010; Sprau and Dingemanse, 2017). Thus, the lack of consistent change in escape behaviour of urban birds during the COVID-19 shutdowns may indicate an absence (or low influx) of generally shy, less tolerant individuals and species from rural or less disturbed areas into the cities…” (L307-314); your point 3) to our point (iii): “Urban birds might have been already habituated to or tolerant of variation in human presence, irrespective of the potential changes in human activity patterns” (L315-329). To distinguish between (ii) and (iii) or the two from (i), individually-marked birds and comprehensive genetic analyses are needed, which we now note in the Discussion (L330-348). Importantly, we also discuss that the lack of response might be due to relatively small changes in human activity (L253-292), which we unfortunately could not fully quantify.

      a) Even had these populations turned over as a result of a massive rural-to-urban dispersal event, it's not clear that the escape distance in those individuals would be different because this paper does not establish that these hypothetical rural birds have a different behavioral response which would be constant following dispersal. Thus the evidence gathered here is insufficient to tell us about possible relocations of the focal species.

      Thank you for this point. We address this point in the Introduction and Discussion (L92-101, 307-314). Rural bird populations/individuals are on average less tolerant of humans than urban birds (e.g. Díaz et al. 2013, PloS One 8:e64634; Tryjanowski et al. 2020, J Tropic Ecol 36:1-5; Mikula et al. 2023, Nat Commun 14:2146) and at the same time, bird individuals seem consistent in their escape responses (Carrete & Tella 2010, Biol Lett 23:167–170; Carrete & Tella 2013, Sci Rep 3:1–7).

      Additionally, the paper cites several papers that found no changes in abundance or movements of animals in response to lockdowns but ignore others that do. For example: (Wilmers et al. 2021), (Warrington et al. 2022) (though this may have been published after this was submitted...), and (Schrimpf et al. 2021).

      We added the papers (L89-91). Thank you!

      There is a missed opportunity to consider the drivers of some of these results - the findings in this paper are interesting in light of studies that did observe changes in space use or abundance - i.e. changes in space use could arise precisely because responses to humans are non-plastic but the distribution and activities of humans changed.

      Thank you. Indeed, we now address this in the Discussion (L303-306): “However, some studies reported changes in the space use by wildlife (Schrimpf et al., 2021; Warrington et al., 2022; Wilmers et al., 2021). and these could arise, as our results indicate, from fixed and non-plastic animal responses to humans who changed their activities”.

      To wit, the primary finding here would imply that the reaction norm to human presence is apparently fixed over such timescales - however, and critically, the putative reduction in human activity/mobility combined with fixed responses at the individual level might then imply changes in avian abundance/movement/etc.

      Unfortunately, we have not measured changes in avian abundance or movements. But, please, note that the change in human mobility in sampled cities might be not as dramatic as initially thought and we consider this scenario to be most plausible in explaining no significant differences in avian escape responses before and during the COVID-19 shutdowns (see Fig. 4). Nevertheless, we add your point into the Discussion: If our findings imply that in birds the reaction norm to human presence is fixed over the studied temporal scale, the putative changes in human presence might then imply changes in avian abundance or movement (L293 and text below it).

      b) If this were the case, wouldn't this be then measurable as a function of some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here? Site accounted for ~15% of the total variation in escape distance but was treated as a random effect - perhaps controlling for the nature of the urban environment using some e.g. remotely sensed variable would provide additional context here.

      Urbanity mirrors the long-term level of human presence in cities whereas we were interested mainly in the rather short-term effects of potential changes of human presence on bird behaviour. Thus, we are not sure how adding such variable will help elucidating the current results. Please, also note that we added the country-specific analysis. Site indeed accounted for considerable amount the total variance in escape distance and that is why it was included as random intercept, which controls for non-independents of data points from each city. This could partly help us to control for difference in habitat type (e.g. urbanization level) within cities.

      c) Because it's not clear the extent to which the populations tested had turned over between years, the paper could do with a bit more caution in interpreting these results as behavioral. This study spans several years so any response (or non-response) is not necessarily a measure of behavioral change because the sample at each time point could (likely does) represent different individuals. In fact, there may be an opportunity here to leverage the one site where pre-pandemic measures were taken several years prior to the pandemic. How much variance in the change in escape distance is observed when the gap between time points far exceeds the lifetime of the focal taxa versus measures taken close in time?

      We believe the initial Fig S4, now Figure 2, addresses this point. The between years temporal variation in FIDs exceeds the variation due to lockdowns. This is true both for measures taken in consecutive years, as well as for measures taken far apart.

      d) Finally, I think there are a few other potential explanations not sufficiently accounted for here:

      i) These behaviors might indeed be plastic, but not over the timescales observed here.

      We agree and have added this point (L301-303). Thank you.

      ii) Time of year - this study took place during the breeding season. The focal behavior here varies with the time of year, for example, escape distance for many of these species could be tied up in nest defense behaviors, tradeoffs between self-preservation and e.g. nest provisioning, etc.

      Please, note that we controlled for the date in our analyses. Date was used as a proxy for the progress in the breeding season (L463-464 and Fig. 1 caption). Note that we collected data only from foraging or resting individuals, and data were neither collected near the nest sites nor from individuals showing warning behaviours, which we now note (L400-401).

      iii) Escape behaviors from humans are adaptively evolved, strongly heritable, and not context dependent - thus we would only expect these behaviors to change on evolutionary timescales.

      We discussed this at L307-308 and 381-383. Escape behaviors from humans are highly consistent for individuals, populations, and species (Carrete & Tella 2010, Biol Lett 23:167–170; Díaz et al. 2013, PloS One 8:e64634; Mikula et al. 2023, Nat Commun 14:2146). Whether such behavior is consistent across contexts is less clear (e.g. Diamant et al. 2023, Proc Royal Soc B, in press; but see, e.g. Radkovic et al. 2019, J Ecotourism 18:100-106; Gnanapragasam et al. 2021, Am Nat 198:653-659). Escape distance is often not measured simultaneously, for example, with human presence. In other words, whereas general level of human presence may have no effect on escape distance, the day-to-day or hour-to-hour variations might. We need studies on fine temporal scales (day-to-day or hour-to-hour) using marked individual to elucidate this phenomenon.

      iv) See point one above - it's possible that the lockdown didn't modify human activity sufficiently to trigger a behavioral response or that the reaction norm to human behavior is non-linear (e.g. a threshold effect).

      We agree, now use also Google Mobility Reports and # of humans data to elucidated this phenomenon and have added such interpretations to L253-292 and, e.g. Fig. 4.

      LITERATURE CITED Geng DC, Innes J, Wu W, Wang G. 2021. Impacts of COVID-19 pandemic on urban park visitation: a global analysis. J For Res 32:553-567. doi:10.1007/s11676-020-01249-w

      Noi E, Rudolph A, Dodge S. 2022. Assessing COVID-induced changes in spatiotemporal structure of mobility in the United States in 2020: a multi-source analytical framework. Int J Geogr Inf Sci.

      Schrimpf MB, Des Brisay PG, Johnston A, Smith AC, Sánchez-Jasso J, Robinson BG, Warrington MH, Mahony NA, Horn AG, Strimas-Mackey M, Fahrig L, Koper N. 2021. Reduced human activity during COVID-19 alters avian land use across North America. Sci Adv 7:eabf5073. doi:10.1126/sciadv.abf5073

      Warrington MH, Schrimpf MB, Des Brisay P, Taylor ME, Koper N. 2022. Avian behaviour changes in response to human activity during the COVID-19 lockdown in the United Kingdom. Proc Biol Sci 289:20212740. doi:10.1098/rspb.2021.2740

      Wilmers CC, Nisi AC, Ranc N. 2021. COVID-19 suppression of human mobility releases mountain lions from a landscape of fear. Curr Biol 31:3952-3955.e3. doi:10.1016/j.cub.2021.06.050

      Reviewer #2 (Public Review):

      Mikula et al. have a large experience studying the escape distances of birds as a proxy of behavioral adaptation to urban environments. They profited from the exceptional conditions of social distance and reduced mobility during the covid-19 pandemic to continue sampling urban populations of birds under exceptional circumstances of low human disturbance. Their aim was to compare these new data with data from previous "normal" years and check whether bird behavior shifted or not as a consequence of people's lockdown. Therefore, this study would add to the growing body of literature assessing the effect of the covid-19 shutdown on animals. In this sense, this is not a novel study. However, the authors provide an interesting conclusion: birds have not changed their behavior during the pandemic shutdown. This lack of effects disagrees with most of the previously published studies on the topic. I think that the authors cannot claim that urban birds were unaffected by the covid-19 shutdown. I think that the authors should claim that they did not find evidence of covid-19-shutdown effects. This point of view is based on some concerns about data collection and analyses, as well as on evolutionary and ecological rationale used by the authors both in their hypotheses and results interpretation. I will explain my criticisms point by point:

      We are grateful for your positive appraisal of our manuscript, as well as for your helpful critical comments. We toned down the discussion to claim, as suggested by you, that we did not find evidence for effects of covid-19-shutdowns on escape behaviour of birds in urban settings (see Results and Discussion sections). In general, we attempted to provide a more nuanced discussion and reporting of our findings. We also changed the manuscript title to “Urban birds' tolerance towards humans was largely unaffected by the COVID-19 shutdowns” and added validation using Google Mobility Reports (Fig. 5 & S6, Table S3a and S5) and the actual number of humans (Fig. 6 and S6; Table S3b-e and S6). Note however that there is only a single robust study on the topic of shutdown and animal escape distances (Diamant et al. 2023, Proc Royal Soc B, in press), i.e. the topic is largely unexplored (e.g. L99-101), whereas we discuss our finding in light of shutdown influences on other behaviours (L293-329).

      1) The authors used ambivalent, sometimes contradictory, reasoning in their predictions and results interpretation. Some examples:

      We tried to clarify our reasoning and increased consistency in our claims in the Introduction. Please, note that we simplified the Introduction and now provide one main expectation: FIDs of urban birds should increase with decreased human presence. This pattern is robustly empirically documented, regardless of the mechanism involved (e.g. Díaz et al. 2013, PloS One 8:e64634; Tryjanowski et al. 2020, J Tropic Ecol 36:1-5; Mikula et al. 2023, Nat Commun 14:2146). Please, see our revised Discussion for a more comprehensive discussion of mechanisms which could explain the patterns described in our study.

      1.1) The authors claimed that urban birds perceive humans as harmless (L224), but birds actually escape from us, when we approach them... Furthermore, they escape usually 5 to 20 m away. This is more distance that would be necessary just to be not trampled.

      We agree and have deleted mentions that humans are perceived as harmless.

      1.2) If we are harmless, why birds should spend time monitoring us as a potential threat (L102)? Indeed, I disagree with the second prediction of the authors. I could argue that reduced human activity should increase animal vigilance because real bird predators (e.g. raptors) may increase their occurrence or activity in empty cities. If birds should increase their vigilance because the invisible shield of human fear of their predators is no longer available, then I would expect longer escape distances.

      Thank you for this comment. We deleted this prediction and largely rewrote Introduction based on your comments and comments from the other reviewers.

      1.3) To justify the same escape behavior shown by birds in pre- and pandemic conditions from an adaptive point of view, the authors argued a lack of plasticity and a strong genetic determination of such behavior. This contravenes the plasticity proposed in the previous point or the expected effect of the stringency index (L112).

      We now attempted to write this more clearly while incorporating your suggestions. In the Discussion, we now propose various hypothesis that can, but need not be mutually exclusive. Please, note that we simplified the Introduction and now provide one main hypothesis: FIDs of urban birds should increase with decreased human presence.

      In my opinion, some degree of plasticity in the escape behavior would be really favorable for individuals from an adaptive perspective, as they may face quite different fear landscapes during their lives. Looking at the figures, one can see notable differences in the escape distance of the same species between sites in the same city. As I can hardly imagine great genetic differences between birds sampled in a park or a cemetery in Rovaniemi, for instance, I would expect a major role of plasticity to explain the observed variability. Furthermore, if escape behavior would not be plastic, I would not expect date or hour effects. By including them in their models, the authors are accepting implicitly some degree of plasticity.

      We regret being unclear. We do accept some degree of plasticity. Yet, our study design prohibits the assessment of the degree of individual plasticity because sampled birds were not individually marked and approached repeatedly. We tried to soften the statements in our Discussion to not fully dismiss a possibility that urban birds have some degree of plasticity in their antipredator behaviour (L293-329). Note however, that while our data collection was not designed to test how hour-to-hour changes in human numbers influence escape distance, the effect of the number of humans (i.e. hour-to-hour variation in human numbers) in our sample was tiny.

      The date and hour effect simply control for the particularities of the given day and hour (e.g. warm vs cold times or the time until sunset). In other words, the within species differences (even from the same park) may have little to do with individual plasticity, but instead may reflect between individual differences. We now add this issue to Methods (L471-476): “This approach enabled us to control for spatial and temporal heterogeneity and specificity in escape behaviour of birds (e.g. species-specific responses, changes in escape distances with the progress in the breeding season, spatial and temporal variation in compliance with government restrictions or particularities of the given day and hour)....”

      2) Looking at the figures I do not see the immense stochasticity (L156, Fig. S3, S5) claimed by the authors. Instead, I can see that some species showed an obvious behavioral change during the shutdown. For instance, Motacilla alba, Larus ridibundus, or Passer domesticus clearly reduced their escape distances, while others like the Dendrocopos major, Passer montanus, or Turdus merula tended to increase it.

      At L138-141 and 327-329 we discussed the within and between genera and cross-country variation and stochasticity in response to the shutdowns (Fig. 2). The reference to species-specific plots was perhaps a little bit misleading. We think that the essential figure, that we now reference at this point, is Figure 2 that shows the temporal trends and/or stochasticity that seem to have little in common with lockdowns. Please, also look at Figure 3 and S3-S4. These show that in all selected genera/species, the trends did not significantly deviate from central regression line which indicates no change in FID before and during the COVID-19 shutdowns.

      On the other hand, birds in Poland tended to have larger escape distances during the shutdown for most species, while in Rovaniemi there was an apparent reduction of escape distances in most cases. The multispecies and multisite approach is a strength of this study, but it is an Achilles' heel at the same time. The huge heterogeneity in bird responses among species and sites counterbalanced and as a result, there was an apparent lack of shutdown effects overall. Furthermore, as most data comes from a few (European) species (i.e. Columba, Passer, Parus, Pica, Turdus, Motacilla) I would say that the overall results are heavily influenced (or biased) by them. The authors realize that results are often area- or species-specific (L203), therefore, does a whole approach make sense?

      We are grateful for this valuable comment. We believe the general approach makes sense as there is a general expectation about how birds should respond to changes in human presence. That is why we control for non-independence of data points in our sample. Thus, although lots of data come from a few European species, this is corrected for by the model. Note that given the sheer number of sampled species, some site- or species-specific trends may have occurred by chance. Importantly, we believe that Figure 2, with species-site specific temporal trends, reveals that the between year stochasticity in escape distances seems greater that any effects of lockdowns. Nevertheless, we have further dealt with this issue in the revised manuscript by running country-specific models which again clearly showed no significant effect of Period on escape behaviour of birds (including, no effects in Poland and Finland).

      3) The previous point is worsened by the heterogeneity of cities and periods sampled. For instance:

      3.1) I can hardly imagine any common feature between a small city in northern Finland (Rovaniemi) and a megacity in Australia (Melbourne). Thus, I would not be surprised to find different results between them.

      3.2) Prague baseline data was for 2014 and 2018, while for the rest of the study sites were for 2018 and 2019. If study sites used a different starting point, you cannot compare differences at the final point.

      We are slightly confused by these comments.

      3.1) The cities are expected to be different but (i) the difference may be smaller than imagined (e.g. park structures, managed grass cover, few shrubs and deciduous-dominated tree species) and (ii) we expect the effects of lockdowns to be similar across cities. Whether we have no people in Rovaniemi parks (which despite Rovaniemi’s small size are usually extremely well-visited) or no people in Melbourne parks should not make a difference in principle. Note however, that to avoid overconfident conclusions, we allow for different reaction norms within cities. Please, also note that we are now providing country-specific results which should identify whether shutdowns lead to different reaction in sampled countries. We found no strong effect of shutdowns in any of sampled countries/cities.

      3.2) Because of the possible between site differences at the starting point, we use study site as random intercept and control for the between site reaction norms by including the random slope of the period. In other words, such possible differences do not influence outcomes of our models. Regardless, our a priori expectation is that the human activity levels in a given park was similar prior to covid and hence in 2014, 2018, and 2019. Again, we are now providing country-specific results which identify whether shutdowns led to different reactions in sampled countries, which they mostly did not

      3.3) Due to the obvious seasonal differences between the northern and southern hemispheres, data collection in Australia began five months later than in the rest of the sites (Aug vs Mar 2020). There, urban birds faced already too many months of reduced human disturbances, while European birds were sampled just at the beginning of the lockdown.

      We agree that each city or even park within the city has its specific environmental conditions (here including the time point of lockdown). That is why we control for city and park location in the random structure of the model (see Method section). We now add results per country that shows no clear differences (e.g. Fig. 1).

      However, the aim of our study was to test for general, global effects of lockdowns, which are minimal. Note that we now specifically test for country-specific effects in separate models on each country (e.g. Fig. 1, Fig S6) but all country-specific effects are small and still centre around zero.

      3.4) Some cities were sampled by a single observer, while others by many of them. Even if all of them are skilled birders, they represent different observers from a statistical point of view and consequently, observer identity was an extra source of noise in your data that you did not account for.

      We agree. In Finland and Hungary, data were collected by two closely cooperating observers. In Poland, all data were collected by a single observer. In the Czech Republic and Australia, a single observer (P.M. and M.W., respectively) sampled 46 sites out of 56 and 32 sites out of 37, respectively. Each site was sampled by the same observer both before and during the shutdowns. We now clearly state it in the Methods (L352-356). In other words, our models already largely control for the possible observer confound by having site as a random intercept. Moreover, previous study showed that FID estimates do not vary significantly between trained observers (Guay et al. 2013, Wildlife Research, 40, 289-293).

      4) Although I liked the stringency index as a variable, I am not sure if it captured effectively the actual human activity every day. Even if restrictive measures were similar between countries, their actual accomplishment greatly depended on people's commitment and authorities' control and sanctions. I would suggest using a more realistic measure of human activity, such as google mobility reports.

      Thank you for this comment. We now validate the use of the stringency index with the Google Mobility Reports, showing that human mobility generally (albeit in some countries relatively weakly) decreases with the strength of governmental antipandemic measures. Please, note that our main research question is related to the general change in human outdoor activity and not to week-to-week, day-to-day or hour-to-hour changes captured by stringency index, Google Mobility or the number of humans during an escape trial data. Nevertheless, using Google Mobility and the number of humans as predictors led to the similar results as for stringency index and Period (Fig. 1 and S6). Please, see extended discussion on this topic in our manuscript (L270-292).

      5) The authors used escape trials from birds on the ground and perched birds. I think that they are not comparable, as birds on the ground probably perceive a greater risk than those placed some meters above the ground, i.e. I would expect shorter escape distances for perched birds. As this can be strongly dependent on the species preferences or sampling site (i.e, more or less available perches), I wonder how this mixture of observations from birds on the ground and perched birds could be affecting the results.

      We now added information that most birds were sampled when on the ground (79%). Importantly, previous studies have found that perch height has a minimum effect on FIDs (e.g. Bjørvik et al. 2015. J Ornithol 156:239–246; Kalb et al. 2019, Ethology 125:430-438; Ncube & Tarakini 2022, Afr J Ecol 60:533– 543; Sreekar et al. 2015,. Tropic Conserv Sci 8:505-512). We added this information to the Method section (L394-395).

      6) The authors did not sample the same location in the same breeding season to avoid repeated sampling of the same individuals (L331). This precaution may help, but it does not guarantee a lack of pseudoreplication. Birds are highly mobile organisms and the same individuals may be found in different places in the same city. This pseudoreplication seems particularly plausible for Rovaniemi, where sampling points must be necessarily close due to the modest size of this city.

      We appreciate your concern. We cannot fully exclude the possibility of sampling some individuals twice. However, we sampled during the breeding season within which most birds are territorial, active in the areas around the nests and hence an individual switching parks is unlikely. Also, most sampled birds in our study are passerines which have small territories (typically few hundred square meters). Some larger birds may have larger territories and move larger distance to forage (e.g. kestrels which often forage outside cities) but these birds represent a minority of our records and we have not sampled outside the cities.

      7) An intriguing result was that the authors collected data for 135 species during the shutdown, while they collected data only for 68 species before the pandemic. Such a two-fold increase in bird richness would not be expected with a 36% increase in sampling effort during 2020-21. I wonder if this could be reflecting an actual increase in bird richness in urban areas as a positive result of the shutdown and reduced human presence.

      There were 141 unique day-years during before COVID and 161 during COVID. So, the sampling effort as calculated by days does not explain the difference in species numbers. Whether the actual effort, which was 381 vs 463 h of sampling, explains the difference is unclear, which we now note in the Methods (L476-483). If not, your proposition is possible, but we would like to avoid any speculations on this topic in the manuscript as it is difficult to infer species diversity from FID sampling.

      8) The authors dismissed the multicollinearity problem of explanatory variables unjustifiably (L383). However, looking at fig. S1, I can see strong correlations between some of them. For instance, period and stringency index were virtually identical (r=0.95), while temperature and date were also strongly correlated.

      We are confused by this comment and think this reflects a misunderstanding. Period and stringency index are explanatory variables of interest that were never included in the same model and hence their correlation does not contribute to the within a model multicollinearity. To avoid further confusion, we note this within (Fig. S2) legend. However, we must be cautious when interpreting the results from the models on period, Google Mobility, # of humans and stringency index, as the four measure are similar.

      We discuss multicollinearity of explanatory variables within the manuscript (L458-538, 548-550) and noted that, with the exception of temperature and day within the breeding season (r = 0.48), the correlations among explanatory variables were minimal. We thus used only temperature as an explanatory variable (i.e. fixed factor; also because temperature reflects both season and variation in temperature across a season) whereas the day was included as a random intercept to control for pseudoreplication within day. Collinearity between all other predictors was low (|r| <0.36).

      9) The random structure of the models is a key element of the statistical analyses but those random factors are poorly explained and justified. I needed to look up the supplementary tables to fully understand the complex architecture of the random part of the models. To the best of my knowledge, random variables aim to account for undesirable correlations in the covariance matrix, which is expected in hierarchical designs, such as the present one. However, the theoretical violation of data independence may happen or not. As the random structure is usually of little interest, you should keep it as simple as necessary, otherwise random factors may be catching part of data variability that you would like to explain by fixed variables. I think that this is what is happening (at least, in part) here, as the authors included a too-complex random structure. For instance, if you include the year as a random factor, I think that you are leaving little room for the period effect. The authors simplified the random structure of the models (L387), but they did not explain how. Nevertheless, this model selection was not important at all, as the authors showed the results for several models. I assume, consequently, that the authors are considering all these models equally valid. This approach seems quite contradictory.

      The random structure of the model controls for possible pseudoreplication in the data, that is for the cases where we have multiple data points that may not be independent and hence technically represent one. Apart from that, random structure tells us about where the variance in the data lies. This is often of interest and your previous questions about city, site or species specificities can be answered with the random part of the model. To follow up on your example, year is included in the model because data from a single year are not independent (for example because of delayed breeding season in one year vs. in another).

      We regret being unclear about the model specification and have attempted to clarify the methods (L466-476). We first specified a model with an ideal random structure that necessarily was complex (perhaps too complex). We then showed that using models with simpler random structures did not influence the outcomes. We now use a simpler model within the main text, but do keep the alternative models to show that the results are not dependent on the random structure of the model (Fig. S1 and Table S2).

      Reviewer #3 (Public Review):

      This study examined the changes in fear response, as measured by the flight initiation distances (FID), of birds living in urban areas. The authors examined the FIDs of birds during the pandemic (COVID-19 lockdown restrictions) compared to FIDs measured before the pandemic (mostly in 2018 & 2019). The main study justification was that human presence changed drastically during the pandemic lockdowns and the change in human presence might have influenced the fear response of birds as a result of changing the "landscape of fear". Human presence was quantified using a 'stringency' index (government-mandated restrictions). Urban areas were selected from within five different cities, which included four European cities (Czech Republic - Prague, Finland - Rovaniemi, Hungary - Budapest, Poland - Poznan), and one city in the global south (Australia - Melbourne). Using 6369 flight initiation distances across 147 different bird species, the authors found that FIDs were not significantly different before the pandemic versus during the pandemic, nor was the variation in FID explained by the level of 'stringency'.

      Major strengths: There are several strengths to this study that allows for understanding the variety of factors that influence a bird's response to fear (measured as flight initiation distances). This study also demonstrates that FIDs are highly variable between species and regions.

      Specifically,

      1) One of the major strengths of this paper is the focus on birds living in urban areas, a habitat type that is hypothesized to have changed drastically in the 'landscape of fear' experienced by animals during the pandemic lockdown restrictions (due to the presumed decrease in human presence and densities). Maintaining the focus on urban birds allowed for a deeper examination of the effect of human behaviour changes on bird behaviour in urban habitats, which are at the interface of human-wildlife interactions.

      2) This study accounted for several variables that are predicted to influence flight initiation distances in birds including species, genus, region (country), variability between years, pandemic year (pre- versus during), the strictness of government-mandated lockdown measures, and ecological factors such as the human observer starting distance, flock size, species-specific body size, ambient air temperature (also a proxy of the timing during the breeding season), time of day, date of data collection (timing within the regional [Europe or Australia] breeding season), and categorization of urban site type (e.g. park, cemetery, city centre).

      3) This study examined FIDs in two years previous to the pandemic (mostly 2018 and 2019, one site was 2014) which would account for some of the within- and between-year FID variation exhibited prior to the pandemic.

      4) This study uses strong statistical approaches (mixed effect models) which allows for repeat sampling, and a post hoc analysis testing for a phylogenetic signal.

      Thank you for your supportive and positive comments.

      Major weaknesses: The authors used government 'stringency' as a proxy for human presence and densities, however, this may not have been an accurate measure of actual human presence at the study sites and during measurements of FIDs. Furthermore, although the authors accounted for many factors that are predicted to influence fear response and FIDs in birds, there are several other factors that may have contributed to the high level of variation and patterns in FIDS observed during this study, thus resulting in the authors' conclusion that FIDs did not vary between pre- and during pandemic years.

      Thank you for your suggestions. We agree. To capture the general human presence in parks, we now incorporated an analysis using Google Mobility Reports (Fig S6b) that directly measures human mobility in each of sampled cities and specifically in urban parks where most our data were collected, and also address your further concerns that you detail below. Albeit not the main interest of our study, we now also incorporated an analysis using actual # of humans during an escape trial (Fig. S6c).

      Moreover, we think that including further possible confounds should not influence our conclusions. In other words, including further confounds will decrease the variance that can be explained by shutdowns and thus such shutdown effects (if any) would be tiny and hence likely not biologically meaningful.

      Specifically,

      1) The authors used "government stringency" as a measure of change in human activity, which makes the assumption that the higher the level of 'stringency', the fewer humans in urban areas where birds are living. However, the association between "stringency" and actual human presence at the study sites was not measured, nor was 'stringency' compared to other measures of human presence such as human mobility.

      Thank you for this essential comment. Initially, we viewed Oxford Stringency Index as the best available index for our purposes. However, we now further acknowledge its limitations (L) and validate the Oxford Stringency Index with the Google Mobility Reports data, showing that both indices are generally negatively (albeit sometimes weakly) correlated across sampled cities (i.e. human mobility decreases with the increasing stringency index). Although other human presence indices were used in the past, e.g. Cuebiq, Descartes Labs and Maryland Uni index, Apple (see Noi et al. 2022, Int J Geograph Info Sci, 36, 585-616), we used only the Google Mobility index because (a) it is publicly available, (b) is available also for territories outside US, and (c) provides data for urban parks within each city included in our dataset. Note however that Google Mobility data are inappropriate to answer our primary question, i.e. whether changes in human presence outdoors due to the COVID-19 shutdowns had any effect on avian tolerance towards humans. First, Google Mobility was available only for 2020-22, i.e. the baseline pre-COVID-19 data for 2018-2019 were unavailable. Thus, there was no way to check whether the human activity levels really changed during the COVID-19 years. Second, Google Mobility data are calculated as a change from 2020 January–February baseline for each day of the week for each city and its location (here we used parks). In other words, the data are not comparable between days and cities, albeit we attempted to correct for this within the random structure of the mixed model. Also, the data may be influenced by extreme events within the 2020 Jan–Feb baseline period (see here). Third, the Google Mobility varies greatly between days and across season (see Fig 4 & S5 or the first figure in these responses), likely more than the possible change due to shutdowns. Nevertheless, we found that results based on Google Mobility are qualitatively very similar to results based on stringency index. Moreover, we showed that the relationships between # of humans and both Google Mobility or Stringency index (Figure 6) are weak and noise with 95%CIs widely overlapping zero (Table S3b-e). Also, similarly to other predictors of human presence, # of humans only poorly predicted changes in avian escape distances. We added details on the new analysis into the Methods and Results and Supplement (L134-165 and associated figures and tables, L415-535).

      2) There was considerable variation in FID measurements, which can be seen in the figures, indicating that most of the variation in FID was not accounted for in the authors' models.

      We are confused by this statement. The fact that the FIDs varied does not translate directly to that our models did not account for the variation. Nevertheless, we do control for most of the discussed confounds (see further answers below). Importantly, it is unclear how including further possible confounds should influence our conclusions, unless the lockdowns effects are tiny, in which case those might not be biologically meaningful.

      Factors that may have contributed to variation in FIDs that were not accounted for in this study are as follows:

      a. The authors accounted for the date of data collection using the 'day' since the start of the general region's breeding season (Europe: Day 1 = 1 April; Australia: Day 1 = 15 August). Using 'day' since the breeding season started probably was an attempt to quantify the effect of the breeding stage (e.g. territory establishment, nest young, fledgling) on FIDs. However, breeding stages vary both within- and between species, as well as between sub-regions (e.g. Finland vs. Hungary). As different species respond to predation or human presence differently depending on the stage during their breeding cycle, more specificity in the breeding cycle stage may allow for explaining the observed variation and patterns in FID.

      We agree. Although we don’t have a precise city-specific information on the timing of breeding stages in sampled populations of birds, we partly control for these effects by including a random intercept of day within each year and species. This random factor explained relatively high portion of the variance in our data (see Table S1 and S2) - perhaps something you expected.

      b. Variation in species-specific FIDs may also vary with habitat features within urban sites, such as the proximity of trees and other protective structures (e.g. perches and cover), the openness of the area, and the level of stressors present (e.g. noise pollution, distance to roads). Perhaps accounting for this habitat heterogeneity would account for the FID variation measured in this study.

      We agree. We don’t have such fine-scale data, but we included site identity (typically within a particular park or cemetery) which should account for the habitat heterogeneity among localities. Depending on the model, site explained relatively little variance (1-6%), indicating low heterogeneity between localities in these undescribed characteristics. Also note that park structure may be quite similar both within and between cities, i.e. managed green grass areas, with only a few shrubs and deciduous trees. Therefore, the possible minor habitat heterogeneity should not have any great impacts on our results.

      c. The authors accounted for species and genus within their models, however, FIDs may vary with other species-specific (or even specific populations of a species) characteristics such as whether the species/population is neophobic versus neophilic, precocial versus altricial, and the level of behavioural plasticity exhibited. These variables were not accounted for in the analysis.

      We agree that FIDs can be correlated with many possible factors. Here, we were interested in general patterns, while controlling for FID differences between species, as well as for possible species-specific reaction norms to lockdowns. Whether neophobic vs neophilic population or precocial versus altricial species react differently to lockdowns might be of interest, but it is beyond the scope of this study. However, that population and population specific reaction norms explain little variation (Table S2a, 0-6% of variation) so such a confound should not substantially influence our conclusion much. We do not have fine-scale data on the level of neophobia, but the effects of lockdowns seem similar for precocial (see Anas, Larus, Cygnus) and altricial (the remaining, mostly passerine) species in our dataset (see Fig. 3 and S3-S4). Please, note that we sampled mainly adults (L386). Moreover, the effects for clades, which may differ in their cognitive skills, are also similar (e.g. Corvids vs. Anas or Cygnus; Fig. 3).

      d. Three different methods of measuring the distances between flight and the observer location were used, and FIDs were only measured once per bird, such that there were no measures of repeatability for a test subject. Thus, variation surrounding the measurement of FIDs would have contributed to the variation in FIDs seen during this study.

      While all observers were trained, the three methods may add some noise to the FID estimates. However, the FID estimates from a single method may still slightly differ between observers (so do well standardized morphology measurements; Wang, et al. 2019, PLoS Biology, 17, e3000156). Importantly, FID estimates are highly replicable among skilled observers (Guay et al. 2013, Wildlife Research 40:289-293), and we previously validated this approach and showed that distance measured by counting steps did not differ from distance measured by a rangefinder (Mikula 2014, Ardea 102:53-60), which we now explicitly state (L391-394). Importantly, we control for observer bias by specifying locality as a random intercept (see further details in our response to the Editor). Moreover, each site was sampled by the same observer both before and during the shutdowns.

      3) The sample design of this study may have influenced the FID variability associated with specific species, and specific populations of species. A different number of species were sampled across the time periods of interest; 68 species were sampled before the pandemic versus 135 species after the pandemic. However, the authors do not appear to have directly compared the FIDs for the same species before the pandemic compared to during the pandemic (e.g. the FIDs of Eurasian blackbirds before the pandemic versus during the pandemic). Furthermore, within the same country-city, it is unclear whether the species observed before the pandemic were observed at the same location (e.g. same habitat type such as the same park) during the pandemic. As a species' FID response may be influenced by population characteristics and features specific to each site (e.g. habitat openness), these factors may have influenced the variability in FID measurements in this study.

      We regret being unclear in our methods. Our full model uses all data, but alternative models (see e.g. Fig. S1) used data with ≥5 as well as ≥10 observations before and during lockdowns for a given species. Importantly, Figure 2 and 3 depict data for species sampled at specific sites. We now clarify this within the Methods (L460-483) and the Results (L125-133 and associated figures) and in the figure legends (Fig. S1).

      4) The models in this study accounted for many factors predicted to affect FIDs (see the section on major strengths), however, the number of fixed and random factors are large in number compared to the total sample size (N =6369), such that models may have been over-extended.

      The number of predictors and random effects is well within the limits for the given sample size (Korner-Nievergelt et al. 2015. Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan). Importantly, simpler models give similar results as the more complex ones (Fig. S1) and the visual (model free) representations of our raw and aggregated data confirm our model results. This, we suggest, makes our findings robust and convincing.

      Overarching main conclusion

      Overall, this study examines factors influencing FIDs in a variety of bird species and concludes that FIDs did not differ during the pandemic lockdowns compared to before the pandemic (2019 and earlier). Furthermore, FIDs were not influenced by the strictness of government-mandated restrictions. Although the authors accounted for many factors influencing the measurement of FIDs in birds, the authors did not achieve their aim of disentangling the effects of pandemic-specific ecological effects from ecological effects unrelated to the pandemic (such as habitat heterogeneity).

      We find this statement confusing. We accounted for most relevant confounding factors and found little evidence for the strong effect of pandemic. Moreover, we now added country-specific analyses that confirm the lack of evidence, highlight the Figure 3 that shows no clear shutdown effect and also explore how levels of human presence changed over and within the years. Adding more possible confounds (albeit note that not many are left to add) might only further reduce the variation that could be explained by pandemic and hence such hypothetical effects of pandemic will be if anything small and thus likely not biologically meaningful.

      Their findings indicate that FIDs are highly variable both within- and between- species, but do not strongly support the conclusion that FIDs did not change in urban species during the pandemic lockdown. Therefore, this study is of limited impact on our understanding of how a drastic change in human behaviour may impact bird behaviour in urban habitats.

      It is unclear why you think our study lacks support for the conclusion that FIDs changed little during pandemic, if all results show no such effects. However, we toned down our Discussion and highlighted also potential issues linked to our approach (e.g. that sampled individuals were not marked and hence we cannot distinguish between various mechanisms that might explain the described pattern (L293-329) or that human presence may not have changed (L253-269). For further details see our previous response.

      Overall, the study demonstrates the challenges in using FIDs as a general fear response in birds, even during a pandemic lockdown when fewer humans are presumably present, and this study illustrates the large degree of variation in FIDs in response to a human observer.

      We appreciate and agree that our study demonstrates the challenges in quantifying human activity to understand bird escape distance and we added a paragraph on this topic to the discussion (L270-292).

      Nevertheless, we hope that our above responses clarify and address most of the issues you had with our manuscript. We tried to show that (a) most of your proposed controls are indeed included in our study design, models, and visualisations, and that (b) multiple evidence (from models and visualisation of raw and aggregated data) support the no overall effect conclusion. We further emphasize the temporal and between- and within-species variability in FIDs in the Results and now specifically indicate that lockdowns did not influenced FIDs above such variability (Fig. 2-3, Fig. S3). In other words, the natural (e.g. temporal) variation in FIDs seems far greater that potential effects of lockdowns (Fig. 2). We believe that even if lockdowns would have tiny effects that could have been detected with more. stringent experimental design (e.g. individually tagged birds) or even more complex models, such effects would be far from being biologically meaningful.

    2. Reviewer #1 (Public Review):

      This paper uses a series of flight initiation "challenges" conducted both prior to and during COVID-19-related restrictions on human movement to estimate the degree to which avian escape responses to humans changed during the "anthropause". This technique is suitable for understanding avian behavioral responses with a high degree of repeatability. The study collects an impressive dataset over multiple years across five cities on two continents. Overall the study finds no effect of lockdown on avian escape distance (the distance at which the "target" individual flees the approaching observer). The study considers the variable of interest as both binary (during lockdown or prior to lockdown) and continuous, using the Oxford Stringency Index (with neither apparently affecting escape distance).

      Overall this paper presents interesting results which may suggest that behavioral responses to humans are rather inflexible over "short" (~2 year) timespans. The anthropause represents a unique opportunity to disentangle the mechanistic drivers of myriad hypothesized impacts humans have on the behavior, distribution, and abundance of animals. Indeed, this finding would provide important context to the larger body of literature aimed at these ends. However, the paper could do more to carefully fit this finding into the broader literature and, in so doing, be a bit more careful about the conclusions they are able to draw given the study design and the measures used. Taking some of these points (in no particular order):

      1) Oxford Stringency Index is a useful measure of governmental responses to the pandemic and it's true that in some scenarios (including the (Geng et al. 2021) study cited by this paper) it can correlate with human mobility. However, it is far from a direct measure of human mobility (even in the Geng study, to my reading, the index only explained a minority of the variation). Moreover, particular sub-components of the index are wholly unrelated to human mobility (e.g., would changes to a country's public information campaign lead to concomitant changes in urban human mobility?). Finally, compliance with government restrictions can vary geographically and over time (i.e., we might expect lower compliance in 2021 than in 2020) and the index is calculated at the scale of entire countries and may not be very reflective of local conditions. Overall this paper could do more to address the potential shortcomings of the Oxford Stringency Index as a measure of human mobility including attempting to validate the effect on human mobility using other datasets (e.g., the google dataset and/or those discussed in (Noi et al. 2022). This is of critical importance since the fundamental logic of the experimental design relies on the assumption that stringency ~ mobility.

      2) The interpretation of the primary finding (that behavioral responses to humans are inflexible) could use a bit more contextualization within the literature. Specifically, the study offers three potential explanations for the observed invariance in escape response: 1) these behaviors are consistent within individuals and this study provides evidence that there was no population turnover as a result of lockdowns; 2) escape response is linked to other urban adaptations such that to be an urban-dwelling species dictates escape response; and/or 3) these populations already exhibit maximum habituation and the reduction in human mobility would only have increased that habituation but that trait is already at a boundary condition. Some comments on each of these respectively:

      a) Even had these populations turned over as a result of a massive rural-to-urban dispersal event, it's not clear that the escape distance in those individuals would be different because this paper does not establish that these hypothetical rural birds have a different behavioral response which would be constant following dispersal. Thus the evidence gathered here is insufficient to tell us about possible relocations of the focal species. Additionally, the paper cites several papers that found no changes in abundance or movements of animals in response to lockdowns but ignore others that do. For example: (Wilmers et al. 2021), (Warrington et al. 2022) (though this may have been published after this was submitted...), and (Schrimpf et al. 2021). There is a missed opportunity to consider the drivers of some of these results - the findings in this paper are interesting in light of studies that *did* observe changes in space use or abundance - i.e., changes in space use could arise precisely *because* responses to humans are non-plastic but the distribution and activities of humans changed. To wit, the primary finding here would imply that the reaction norm to human presence is apparently fixed over such timescales - however, and critically, the putative reduction in human activity/mobility combined with fixed responses at the individual level might then imply changes in avian abundance/movement/etc.

      b) If this were the case, wouldn't this be then measurable as a function of some measure of urbanity (e.g. Human Footprint Index) that varies across the cities included here? Site accounted for ~15% of the total variation in escape distance but was treated as a random effect - perhaps controlling for the nature of the urban environment using some e.g., remotely sensed variable would provide additional context here.

      c) Because it's not clear the extent to which the populations tested had turned over between years, the paper could do with a bit more caution in interpreting these results as behavioral. This study spans several years so any response (or non-response) is not necessarily a measure of behavioral change because the sample at each time point could (likely does) represent different individuals. In fact, there may be an opportunity here to leverage the one site where pre-pandemic measures were taken several years prior to the pandemic. How much variance in the change in escape distance is observed when the gap between time points far exceeds the lifetime of the focal taxa versus measures taken close in time?

      d) Finally, I think there are a few other potential explanations not sufficiently accounted for here:

      i) These behaviors might indeed be plastic, but not over the timescales observed here.<br /> ii) Time of year - this study took place during the breeding season. The focal behavior here varies with the time of year, for example, escape distance for many of these species could be tied up in nest defense behaviors, tradeoffs between self-preservation and e.g., nest provisioning, etc.<br /> iii) Escape behaviors from humans are adaptively evolved, strongly heritable, and not context dependent - thus we would only expect these behaviors to change on evolutionary timescales.<br /> iv) See point one above - it's possible that the lockdown didn't modify human activity sufficiently to trigger a behavioral response or that the reaction norm to human behavior is non-linear (e.g. a threshold effect).

      LITERATURE CITED<br /> Geng DC, Innes J, Wu W, Wang G. 2021. Impacts of COVID-19 pandemic on urban park visitation: a global analysis. J For Res 32:553-567. doi:10.1007/s11676-020-01249-w

      Noi E, Rudolph A, Dodge S. 2022. Assessing COVID-induced changes in spatiotemporal structure of mobility in the United States in 2020: a multi-source analytical framework. Int J Geogr Inf Sci.

      Schrimpf MB, Des Brisay PG, Johnston A, Smith AC, Sánchez-Jasso J, Robinson BG, Warrington MH, Mahony NA, Horn AG, Strimas-Mackey M, Fahrig L, Koper N. 2021. Reduced human activity during COVID-19 alters avian land use across North America. Sci Adv 7:eabf5073. doi:10.1126/sciadv.abf5073

      Warrington MH, Schrimpf MB, Des Brisay P, Taylor ME, Koper N. 2022. Avian behaviour changes in response to human activity during the COVID-19 lockdown in the United Kingdom. Proc Biol Sci 289:20212740. doi:10.1098/rspb.2021.2740

      Wilmers CC, Nisi AC, Ranc N. 2021. COVID-19 suppression of human mobility releases mountain lions from a landscape of fear. Curr Biol 31:3952-3955.e3. doi:10.1016/j.cub.2021.06.050

    3. Reviewer #2 (Public Review):

      Mikula et al. have a large experience studying the escape distances of birds as a proxy of behavioral adaptation to urban environments. They profited from the exceptional conditions of social distance and reduced mobility during the covid-19 pandemic to continue sampling urban populations of birds under exceptional circumstances of low human disturbance. Their aim was to compare these new data with data from previous "normal" years and check whether bird behavior shifted or not as a consequence of people's lockdown. Therefore, this study would add to the growing body of literature assessing the effect of the covid-19 shutdown on animals. In this sense, this is not a novel study. However, the authors provide an interesting conclusion: birds have not changed their behavior during the pandemic shutdown. This lack of effects disagrees with most of the previously published studies on the topic. I think that the authors cannot claim that urban birds were unaffected by the covid-19 shutdown. I think that the authors should claim that they did not find evidence of covid-19-shutdown effects. This point of view is based on some concerns about data collection and analyses, as well as on evolutionary and ecological rationale used by the authors both in their hypotheses and results interpretation. I will explain my criticisms point by point:

      1) The authors used ambivalent, sometimes contradictory, reasoning in their predictions and results interpretation. Some examples:<br /> 1.1) The authors claimed that urban birds perceive humans as harmless (L224), but birds actually escape from us, when we approach them... Furthermore, they escape usually 5 to 20 m away. This is more distance that would be necessary just to be not trampled.<br /> 1.2) If we are harmless, why birds should spend time monitoring us as a potential threat (L102)? Indeed, I disagree with the second prediction of the authors. I could argue that reduced human activity should increase animal vigilance because real bird predators (e.g., raptors) may increase their occurrence or activity in empty cities. If birds should increase their vigilance because the invisible shield of human fear of their predators is no longer available, then I would expect longer escape distances.<br /> 1.3) To justify the same escape behavior shown by birds in pre- and pandemic conditions from an adaptive point of view, the authors argued a lack of plasticity and a strong genetic determination of such behavior. This contravenes the plasticity proposed in the previous point or the expected effect of the stringency index (L112). In my opinion, some degree of plasticity in the escape behavior would be really favorable for individuals from an adaptive perspective, as they may face quite different fear landscapes during their lives. Looking at the figures, one can see notable differences in the escape distance of the same species between sites in the same city. As I can hardly imagine great genetic differences between birds sampled in a park or a cemetery in Rovaniemi, for instance, I would expect a major role of plasticity to explain the observed variability. Furthermore, if escape behavior would not be plastic, I would not expect date or hour effects. By including them in their models, the authors are accepting implicitly some degree of plasticity.

      2) Looking at the figures I do not see the immense stochasticity (L156, Fig. S3, S5) claimed by the authors. Instead, I can see that some species showed an obvious behavioral change during the shutdown. For instance, Motacilla alba, Larus ridibundus, or Passer domesticus clearly reduced their escape distances, while others like the Dendrocopos major, Passer montanus, or Turdus merula tended to increase it. On the other hand, birds in Poland tended to have larger escape distances during the shutdown for most species, while in Rovaniemi there was an apparent reduction of escape distances in most cases. The multispecies and multisite approach is a strength of this study, but it is an Achilles' heel at the same time. The huge heterogeneity in bird responses among species and sites counterbalanced and as a result, there was an apparent lack of shutdown effects overall. Furthermore, as most data comes from a few (European) species (i.e., Columba, Passer, Parus, Pica, Turdus, Motacilla) I would say that the overall results are heavily influenced (or biased) by them. The authors realize that results are often area- or species-specific (L203), therefore, does a whole approach make sense?

      3) The previous point is worsened by the heterogeneity of cities and periods sampled. For instance:<br /> 3.1) I can hardly imagine any common feature between a small city in northern Finland (Rovaniemi) and a megacity in Australia (Melbourne). Thus, I would not be surprised to find different results between them.<br /> 3.2) Prague baseline data was for 2014 and 2018, while for the rest of the study sites were for 2018 and 2019. If study sites used a different starting point, you cannot compare differences at the final point.<br /> 3.3) Due to the obvious seasonal differences between the northern and southern hemispheres, data collection in Australia began five months later than in the rest of the sites (Aug vs Mar 2020). There, urban birds faced already too many months of reduced human disturbances, while European birds were sampled just at the beginning of the lockdown.<br /> 3.4) Some cities were sampled by a single observer, while others by many of them. Even if all of them are skilled birders, they represent different observers from a statistical point of view and consequently, observer identity was an extra source of noise in your data that you did not account for.

      4) Although I liked the stringency index as a variable, I am not sure if it captured effectively the actual human activity every day. Even if restrictive measures were similar between countries, their actual accomplishment greatly depended on people's commitment and authorities' control and sanctions. I would suggest using a more realistic measure of human activity, such as google mobility reports.

      5) The authors used escape trials from birds on the ground and perched birds. I think that they are not comparable, as birds on the ground probably perceive a greater risk than those placed some meters above the ground, i.e. I would expect shorter escape distances for perched birds. As this can be strongly dependent on the species preferences or sampling site (i.e, more or less available perches), I wonder how this mixture of observations from birds on the ground and perched birds could be affecting the results.

      6) The authors did not sample the same location in the same breeding season to avoid repeated sampling of the same individuals (L331). This precaution may help, but it does not guarantee a lack of pseudoreplication. Birds are highly mobile organisms and the same individuals may be found in different places in the same city. This pseudoreplication seems particularly plausible for Rovaniemi, where sampling points must be necessarily close due to the modest size of this city.

      7) An intriguing result was that the authors collected data for 135 species during the shutdown, while they collected data only for 68 species before the pandemic. Such a two-fold increase in bird richness would not be expected with a 36% increase in sampling effort during 2020-21. I wonder if this could be reflecting an actual increase in bird richness in urban areas as a positive result of the shutdown and reduced human presence.

      8) The authors dismissed the multicollinearity problem of explanatory variables unjustifiably (L383). However, looking at fig. S1, I can see strong correlations between some of them. For instance, period and stringency index were virtually identical (r=0.95), while temperature and date were also strongly correlated.

      9) The random structure of the models is a key element of the statistical analyses but those random factors are poorly explained and justified. I needed to look up the supplementary tables to fully understand the complex architecture of the random part of the models. To the best of my knowledge, random variables aim to account for undesirable correlations in the covariance matrix, which is expected in hierarchical designs, such as the present one. However, the theoretical violation of data independence may happen or not. As the random structure is usually of little interest, you should keep it as simple as necessary, otherwise random factors may be catching part of data variability that you would like to explain by fixed variables. I think that this is what is happening (at least, in part) here, as the authors included a too-complex random structure. For instance, if you include the year as a random factor, I think that you are leaving little room for the period effect. The authors simplified the random structure of the models (L387), but they did not explain how. Nevertheless, this model selection was not important at all, as the authors showed the results for several models. I assume, consequently, that the authors are considering all these models equally valid. This approach seems quite contradictory.

    1. AbstractRecent advances in bioinformatics and high-throughput sequencing have enabled the large-scale recovery of genomes from metagenomes. This has the potential to bring important insights as researchers can bypass cultivation and analyse genomes sourced directly from environmental samples. There are, however, technical challenges associated with this process, most notably the complexity of computational workflows required to process metagenomic data, which include dozens of bioinformatics software tools, each with their own set of customisable parameters that affect the final output of the workflow. At the core of these workflows are the processes of assembly - combining the short input reads into longer, contiguous fragments (contigs), and binning - clustering these contigs into individual genome bins. Both processes can be done for each sample separately or by pooling together multiple samples to leverage information from a combination of samples. Here we present Metaphor, a fully-automated workflow for genome-resolved metagenomics (GRM). Metaphor differs from existing GRM workflows by offering flexible approaches for the assembly and binning of the input data, and by combining multiple binning algorithms with a bin refinement step to achieve high quality genome bins. Moreover, Metaphor generates reports to evaluate the performance of the workflow. We showcase the functionality of Metaphor on different synthetic datasets, and the impact of available assembly and binning strategies on the final results. The workflow is freely available at https://github.com/vinisalazar/metaphor.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.1093/gigascience/giad055) and has published the reviews under the same license. These are as follows.

      **Reviewer 1. Thomas Brüls **

      The authors present a snakemake-based workflow to automate and chain the main computational ingredients (assembly and binning) of genome-centric metagenomics; the authors developed a technically sound tool for this purpose, and by itself it is certainly valuable to the research community and worth of publication. however, even if the article is casted as a technical note -hence with an emphasis on the design, implementation and assessment of the tool-, I feel that a more thorough discussion of both its abilities and inabilities (e.g. strain resolution, detection of low abundance organisms, identification of virus bins, etc) would be worth for a more general audience. On the same token, a more deep discussion of some of the results obtained with their tool (see below) would be of interest and would also illustrate useful use cases. I would suggest the following modifications/additions:-the experiments with the strain madness dataset suggest that the genomes (or fragments thereof, i.e. the bins) resolved should be viewed as "species" genomes, or composite genomes possibly originating from multiple strains. if so, do the authors think this represents a hard limit to the assembly + binning approach, or could further existing tools (e.g. performing variant detection on top of cross-assembly before the binning step) be integrated or developed in the future for strain-resolution (i.e. to identify strains not dominant in any sample)? -related, a simple summary of the number of individual strains recovered in individual bins for the strain madness experiment would be interesting.-another issue that would be worth discussing in my opinion is the impact of genome abundance on the recovery of corresponding bins and their quality. the platform developed by the authors appears to be well suited for such kind of analyses and the results would be of both theoretical and practical interest. to put it simply, what is the minimal initial coverage of genomes required in order for them to be recovered in bins of a given size and quality?-rem: theses two issues (strain-level diversity and individual strain genome abundances) likely interact to limit bin resolution, and this could be mentioned by the authors.-the data presented by the authors suggest that the metabat binning engine significantly outperforms the other two tools (concoct and vamb, which are both widely used), see e.g Figure 2; what would account for that, and do the authors think this is a general observation (i.e. beyond the specific CACB setting or marine metagenome shown in Fig 2)? -a bin refinement step (based on the DAS tool and dereplication) is frequently mentioned but should be more detailed (including a precise definition of the bin quality metric used).

      further rather minor comments: -in the abstract, when mentioning "technical challenges associated with...", it would be worth mentioning that algorithmic challenges are present as well. -in the introduction, "It is hypothesised that pooled assembly and binning may lead to improved results when analysing communities with high genetic diversity, and to poorer results when there is a high level of intraspecies/strain-level diversity". I would assume there are many instances in the real world that are both, i.e. that present both high inter-species and intra-species genetic diversity, what then?-in the future directions, the authors mention the identification of eukaryotic and viral contigs and bins, and could shortly elaborate how this could be done properly. -the sentence "In summary, our assessment of ..." at the end of the ms appears to have a syntactic problem.

    1. AbstractHetnets, short for “heterogeneous networks”, contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes — including genes, diseases, drugs, pathways, and anatomical structures — with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open source implementation of these methods in our new Python package named hetmatpy.Competing Interest Statement

      **Reviewer 2. Paolo Provero **

      In this work Himmelstein and collaborators introduce a statistically controlled way of extracting significant node pairs in heterogeneous networks (hetnets) without relying on a ground truth and related training. The method "explains" why two nodes are significantly connected by extracting the metapaths most responsible for the enrichment. This is based on computing a null distribution of the DWPC, which allows assigning a P-value to each metapath joining two nodes, and then to visualize the individual paths responsible for the enrichment. The method is novel and significant, and can be in principle be applied to many hetnets, in life sciences and beyond, when a ground truth is not available or not desirable as it would introduce bias. The software tools developed appear to be readily available to other researchers.

      Major comment: If I understand correctly, given two nodes (say "Alzheimer disease" and "Circadian rhythm") the method extracts, in a statistically controlled way, the most significant metapaths joining the two nodes, and then the individual paths responsible for the enrichment. But this is not the most obvious question a life scientist would ask the network, which would be instead something like "Which are the pathways most significantly connected to "Alzheimer disease"? Indeed this type of question would be the one to ask when aiming for drug repurposing (possibly replacing "pathways" with "compounds" or "pharmacologic classes"). Based on Fig. 4A, the pathways are presented, or "suggested," in decreasing order of number of metapaths, but this is hardly a ranking by significance. Would it be possible to summarize the results in such a way as to rank the pathway nodes connected to a given disease node by significance (or more generally to rank the nodes of a certain type by the significance of their connection to a given node of another type)? This should be discussed.

      I also have several minor concerns. (1) The authors introduce and compute a null distribution of the DWPC which takes into account node degree in a statistically controlled way when evaluating the connectivity between two nodes. However, the DWPC itself does take into account node degree, as the name implies, and contains a tunable parameter that can be optimized, at least when a ground truth is available (as in Ref 39 by the same first author). I understand such tuning is not possible when, as in the present case, no ground truth is available, but the authors should make this point more clearly. (2) I find Fig. 1B a bit confusing: according to the legend, the top rows are known treatments, which should have higher than expected connectivity. However, based on the colors as explained by the legend, the bottom treatment/disease pairs seem to have higher connectivity (3) The acronym DWPC is defined after it has been used several times (4) The legend of Figure 2 should specify that these results apply to the nodes "Alzheimer disease" and "Circadian rhythm", although this becomes clear in Fig. 4 (5) I don't think Figure 3, representing the home page of the web site, is especially useful (6) I found Fig. 4 confusing: the sum of the path counts for the selected metapaths in panel B is way larger than the 425 results shown in Panel C. As far as I understand no path can belong to more than one metapaths, so is there some further selection here? (7) The "Frontend" section of the Methods seems a bit too detailed for the Gigascience audience.

      Re-review: The authors have addressed all my comments in a satisfactory way.

    1. Author Response

      Reviewer #2 (Public Review):

      This work attempts to connect the diet of a mother to the physiology and feeding behaviors of multiple generations of her offspring. Using genetic and molecular biology approaches in the fruit fly model, the authors argue that this Lamarckian inheritance is mediated by germline-inherited chromatin and is regulated by the general activity of a histone methylase. However, many of the measured effects are small and variable, the statistical tests to prove their significance are missing or poorly described, and some experiments are inadequately described and lack important controls.

      1) The authors claim that the diet of a mother can influence the physiology of her progeny for several generations. However, the observed effects of maternal diet on later generations were small and variable for most assays (see Fig1C, S1.1A, B, D). Additionally, the effect size between F0 HSD to ND was often larger than the effect size between the progeny of F0 parents and ND. To put it another way, if the authors were to compare the F1, F2, etc. to the F0 HSD flies, they would conclude that the majority of the response to diet is not maternally transmitted, and is directly controlled by the diet of the individual being measured.

      We agree with the reviewer that the effect size of acute HSD exposure (in HSD-F0 flies) was stronger than that of transgenerational inheritance (in HSD-F1/2/3/4 flies). Similar observations were also made in other studies, see Klosin et al., Science, 2017, Bozler et al., eLife, 2019. We would argue this difference in effect size was as expected and with clear biological relevance.

      For all living organisms, acute environmental changes (diet change included) have direct and profound influences on their survival and reproduction, and therefore need robust and immediate responses. In comparison, ancestral environmental changes may only provide some vague and indirect indications of the current living environment of the offspring. Such information may be beneficial for the survival and reproduction of the offspring, but the effect size is expected to be much smaller, or at least smaller than that of acute environmental changes.

      Studies on Dutch Famine offers a good example. Human individuals who were prenatally exposed to famine were found to be associated with greater risk in metabolic diseases (Ravelli et al., NEJM, 1976). But nevertheless, direct high-fat diet exposure was still the much stronger risk factor for obesity and metabolic disorders (Bray et al., Am J Clin Nutr, 1998, Jéquier et al., Int J Obes Relat Metab Disord, 2002).

      We have added additional discussions in the manuscript for clarification.

      Furthermore, since our current study aimed to investigate the mechanism of behavioral transgenerational inheritance, we focused on the comparison between HSD-F1 flies (and their progeny) vs. ND-fed flies. As the ancestors of HSD-F1/2/3/4 flies were exposed to HSD, whereas HSD-F1/2/3/4 flies themselves were never exposed to HSD, any difference we observed between the two groups could be solely attributed to transgenerational inheritance of ancestral HSD exposure. With that saying, to better distinguish the effects of acute HSD exposure vs. transgenerational inheritance upon ancestral HSD exposure, we re-analysed and presented the comparisons among ND, HSD-F0, and HSD-F1 data in the manuscript (Figure 1. B-E, Figure 1-figure supplement 1. A-E, Figure 1-figure supplement 2. A-D, Figure 3. D-E, Figure 3-figure supplement 1. B-D, Figure 3-figure supplement 2 and 3. A-B).

      2) The authors chose to study PER, which had the largest average effect sizes between conditions. However, PER was highly variable in the averaged data, with some individuals showing large effects and others having no effects. A better characterization of transgenerational PER may increase the robustness of this assay and confidence in its results. For example, the authors could measure PER in lineages derived from individual flies to determine when transgenerational effects on PER decline or disappear. This form of data collection could help to explain the high variation in the averaged data presented in the paper.

      We acknowledged that PER in general was quite a variable behavioural trait (probably as to most if not all behavioural measures). It was not surprising since animal behaviours, as complex traits, could be influenced by numerous intrinsic and extrinsic factors, such as genetic background, developmental environment, diet, population density, environmental conditions, etc. Numerous PER studies have exhibited similar variability (Masek et al., PNAS, 2010, Marella et al., Neuron, 2012, Charlu et al., Nature Communication, 2013, Wang et al., Cell Metabolism, 2016, Wang et al., Cell Reports, 2020).

      Nevertheless, in our current study we were able to identify statistically significant behavioural difference between ND-fed flies and HSD-F1/2/3 flies, demonstrating that ancestral HSD exposure imposed transgenerational inheritance on sweet sensitivity. To further increase the robustness of the study as suggested by the reviewer, we have conducted additional repetitions of many PER experiments and further confirmed the phenotype with less variability and more statistical power (Figure 1. G-I, Figure 3. D-E, Figure 3-figure supplement 1. B-D, Figure 3-figure supplement 2 and 3. A-B). The reviewer also suggested the use of isogenic flies, which might help to minimize the variations of genetic background. However, we think that demonstrating the behavioural difference in genetically diverse fly populations is a more credible way to show that such transgenerational inheritance is a reliable and generalizable phenomenon.

      3) What do the error bars represent on any figure? There are many examples where the data is highly variable and lies completely outside of the error bars. What is the statistical test for significance that is carried out in each figure? The brief comment about statistics in the methods section is inadequate. The authors should also supply the raw data used to generate the figures so that readers can perform their own statistical tests.

      Data in the manuscript were represented as means ± SEM (standard error of the mean) in all of our figures, which is a standard practice in the field (Masek et al., PNAS, 2010, Charlu et al., Nature Comm, 2013, Wang et al., Cell Metabolism, 2016). We have provided detailed explanations of the statistical tests in the manuscript. We have also prepared raw data files as suggested by the reviewer.

      The model that global H3K27me3 is regulated by ancestral diet is unconvincing without further experimental validation and explanation. Points 4-10 address specific issues.

      4) The authors performed ChIP on cycle 11 embryos. This stage is extremely short (11 min) and contains roughly 10 times less chromatin than embryos only 30 minutes older. These features make it very difficult to collect large numbers of precisely staged embryos without significant contamination. It is also debatable whether early cell cycles (including and preceding cycle 11) are slow enough to deposit and propagate histone marks in the presence of new histone incorporation. See the opposing arguments in Zenk et al 2017 and Li et al 2014. The authors could perform ChIP on older embryos to avoid this controversy.

      We thank the reviewer for the clarification. Our embryo collection protocol involved allowing flies to lay eggs freely in a cage for 30 minutes followed by 50 minutes of incubation on a juice plate, and then completing the embryo sorting within 30 minutes. Therefore, to describe it in a more stringent way, our embryos should be in the stage between cycle 10-12. We have corrected this information in the manuscript (Figure 2. A).

      Since all the embryos were sorted using the same morphological criteria within the same time frame, their developmental stages should be comparable (i.e. all from cycle 10-12). In several references we consulted, a broader range (cycle 9-13) was used for ChIP-seq sequencing analysis (for example, see Zenk et al., Science, 2017).

      Surely any maternally inherited information will also be present in cycle 14 or 15 embryos if it is to influence the development or physiology of the brain. The observed differences in global H3K27me3 levels in F1 vs ND flies could be explained by slightly different aged embryo collections or technical variations in the ChIP protocol. The authors could strengthen their conclusion by performing more ChIP replicates. Alternatively, the authors could use orthogonal approaches like antibody staining or western blots to measure global H3K27me3 levels in precisely staged embryos.

      We chose to use cycle 10-12 embryos because we aimed to identify epigenetic modulations directly transmitted through the maternal germline. Embryos in cycle 14-15 might reveal more profound changes, but since embryos in that stage had entered the zygotic phase and started the remodeling of histone modifications, we think it might mask the maternally transmitted changes we sought to identify.

      In addition, we conducted two biological replicates for each group for the ChIP-seq analysis, which was a standard in the field (Zenk et al., Nature, 2021, Ing-Simmon et al., Nature Genetics, 2021). In the current study we further verified the genes identified in the ChIP-seq analysis in RNA-seq and qPCR analysis.

      We further verified the ChIP-seq results by using western blot, which showed a ~2 folds increase in H3K27me3 modification in HSD-F1 early embryos vs. ND-fed embryos, in line with the ChIP-seq data (Figure 2-figure supplement 1. B). We have also provided immunofluorescence results for embryos at cycle 13 and cycle 14, which clearly showed a significant increase in H3K27me3 modifications in HSD-F1 embryos (Figure 2-figure supplement 1. C).

      5) The authors measure PRC2 subunit mRNA levels in adult fly heads to attempt to explain the observed differences in inherited H3K27me3 levels in fly embryos. The authors should examine PRC2 components in germ cells and early embryos to understand how germ cells and early embryos generate H3K27me3 patterns.

      We have now shown that Pcl and E(z) mRNA expression in HSD-F0 flies were not significantly changed vs. ND-fed flies (Figure 2-figure supplement 2. D-G). Meanwhile, H3K27me3 demethylase UTX and H3K27ac acetyltransferase Cbp showed significant decrease (Figure 2-figure supplement 2. H). Therefore, HSD exposure imposed complex epigenetic modifications in HSD-F0 flies, which then led to transmission of epigenetic marks to their progeny. Given the main scope of this study was to understand which epigenetic program mediated the behavioral transgenerational inheritance upon ancestral HSD exposure (but not that mediated acute HSD exposure), we focused our effect on H3K27me3 which was significantly changed between HSD-F1 flies vs. ND-fed flies.

      6) The RNAi experiment targeting PRC2 components in embryos is uninterpretable without appropriate controls and an explanation of the genotypes used in the experimental paradigm. Are the authors crossing nosNGT mothers to UAS-RNAi fathers and assaying the progeny? What is the genotype of the F1 flies and how does it compare to the genotype of the ND flies? The authors should also note that the Gal4 drivers they use are not necessarily restricted to the ovary, and could directly affect other tissues controlling PER like neurons and muscle. Additionally, the authors should supply the appropriate controls to verify that their experimental paradigm has the intended effect. PRC2 proteins are presumably loaded into embryos and would be immune to zygotic-expressed RNAi. The authors could validate when PRC2 RNAi is effective by staining embryos for H3K27me3.

      We have now added schematic diagrams and detailed explanations in our revised manuscript to better explain the RNAi experiments (Figure 3-figure supplement 1. A). As shown in the diagram, we compared each RNAi treatment group to appropriate genetic controls. We have also noted in the manuscript that the GAL4 drivers we used were not restricted to the ovary.

      We have now verified the effect of PRC2 knockdown to reduce H3K27me3 in female germline by both western blot and immunofluorescence staining (Figure 3. B-C).

      7) Although the authors do not note this, nosNGT>RNAi affects the PER of ND flies (compare Gal4>RNAi to just RNAi or just Gal4 in ND columns in Fig3A-D). This could be due to RNAi expression in neurons or muscles or some other indirect effect. Regardless of the mechanism, this result makes it difficult to interpret how RNAi treatments affect the transgenerational inheritance of PER if there is an equivalently strong nontransgenerational effect.

      Although nosNGT>RNAi appeared to slightly affect PER response of ND-fed flies, there was no statistically significant difference (Figure 3-figure supplement 1. B and D, Figure 3-figure supplement 2. A-B). Rather, the effect of E(z) knockdown was evident in HSD-F1 flies (Figure 3-figure supplement 1. B), further confirming the involvement of H3K27me3 in transgenerational inheritance of PER reduction.

      8) The matalpha gal4 experiment is inadequately explained in the text or methods. Are the authors expressing RNAi in the ovaries of the F0 flies that are fed an HSD? Does the ovary influence their PER somehow? Similar to point 8, there appears to be a nontransgenerational component to the RNAi phenotype that clouds the interpretation of the transgenerational effect (compare F0 in S3.1A-C).

      We have now added a schematic diagram and detailed explanations in our revised manuscript to better explain the RNAi experiments (Figure 3. A). As shown in the diagram, we compared each RNAi treatment group to appropriate genetic controls.

      Similar to point 7, although Mat-tub-GAL4>RNAi might seem to affect PER responses of ND-fed flies, there was no statistically significant difference (Figure 3. D-E). Rather, the effect of E(z) knockdown was evident in HSD-F1 flies (Figure 3. D), further confirming the involvement of H3K27me3 in transgenerational inheritance of PER reduction.

      9) For the EED inhibitor experiments (both PER and calcium imaging), it is unclear whether the authors fed the mothers or their adult progeny the EED inhibitor. If adult progeny were fed, what tissues were affected? The authors should stain various tissues with an H3K27me3 antibody to verify the effectiveness of their inhibitor. Finally, the effect of the EED inhibitor on calcium imaging was not convincing because the variation was so large.

      We have added a new schematic diagram and provided more detailed explanations in the manuscript for pharmacological interventions (Figure 4. A-D). To verify the effect of the drug treatment, we showed that compared to the control group fed with DMSO, flies fed with the inhibitor showed a significant decrease in H3K27me3 levels, demonstrating the effectiveness of the inhibitor (Figure 4-figure supplement 1. A).

      We acknowledged the unsatisfactory quality of our calcium imaging experiments in our initial submission. We have now improved our experimental procedures to reach better data quality, while the conclusions remained consistent (Figure 4. E).

      10) In all of the PRC2 RNAi and inhibitor experiments, are there any other phenotypes that would suggest that the treatments are working? There are many published PRC2 loss-offunction phenotypes (molecular and developmental) in different tissues. The authors could assure the reader that their treatments are working as expected by doing these controls.

      As discussed above, we have now used western blot and immunofluorescence staining to validate the efficiency of PRC2 RNAi in female germline (Figure 3. B-C).

      11) The authors propose that a transgenerationally inherited state of the caudal gene is responsible for the transgenerationally inherited PER. However, the experiments investigating the methylation state and expression level of caudal are unconvincing. Cad mRNA abundance varied immensely in the ND RNAseq samples. When the authors compared cad levels across generations, the effect size was small. A single outlier in the ND sample in both the RNAseq and the RTPCR experiments appears to drive up its mean and effect size. The H3K27me3 ChIP on cad is very similar in the F1 and ND samples and the acetylation peak on its promoter appears unchanged. The authors could vastly improve the caudal experiments in this paper by simply using cad antibodies to stain the relevant tissues that contribute to PER. For example, the authors could stain GR5a neurons for cad expression in different generations that inherit (or don't inherit) maternal PER to more accurately determine if cad levels are indeed transgenerationally regulated. The authors could also perform more ChIP experiments at a less variable stage to convincingly correlate epigenetic marks on cad with its expression level.

      As discussed above, we conducted two biological replicates for each condition of the ChIP-seq analysis, which was a standard in the field (Zenk et al., Nature, 2021, IngSimmon et al., Nature Genetics, 2021). We have also performed western blot and immunofluorescence for H3K27me3 in ND vs. HSD-F1 embryos to further validate our ChIP-seq data (Figure 2-figure supplement 1. B-C).

      As for Cad gene, H3K27m3 signals showed a statistically significant difference between ND-fed and HSD-F1 flies (Figure 5. D). We have also conducted additional qPCR experiments to verify the gene expression changes of the Cad gene (Figure 5. F, right), which was in line with the ChIP-seq data and further supported its validity.

      It was worth noting that during the developmental time window of our ChIP-seq analysis, the acetylation signals in the promoter region of cad were very low (Figure 5. D), making it impossible to make a comparison.

      Reviewer #3 (Public Review):

      Jie Yang et al. investigated the transgenerational behavioral modification of a high-sugar diet (HSD) in Drosophila and revealed the underlying molecular and neural mechanisms. It has been reported that HSD exposure decreases sweet sensitivity in gustatory sensory neurons, resulting in reduced sugar response (Proboscis extension reflex, PER) in flies. The current study reports that this effect can be transmitted across generations through the maternal germline. Furthermore, the authors show that H3K27me3 modification is enhanced in the first-generation progenies of HSD-treated flies (F1), and genetical or pharmacological disruption of PCL-PRC2 complex blocks the behavioral change and restores the sweet sensitivity in the Gr5a+ sweet sensory neurons. The authors further analyze the differentially expressed genes in the F1 flies. Among H3K27me3 hypermethylated regions, they focus on homeobox genes and find a transcription factor Caudal (Cad), which shows decreased expression in the F1 flies. Knocking down Cad in Gr5a+ neurons results in decreased PER response to sucrose.

      Transgenerational changes in physiology and metabolism have been broadly studied, while inherited changes at the behavioral level are much less investigated. This work provides convincing evidence for transgenerational modification of feeding behavior and digs out the underlying molecular and neural mechanisms. However, there still are several concerns that need to be clarified.

      1) The epigenetic regulator PCR2 has been found to play an essential role in the 7d-HSDinduced modification of the PER response. In this study, it's important to clarify for the transgenerational change, whether epigenetic modification is required in the flies exposed to HSD (F0), the progenies (F1), or both. It would be very helpful for better interpretation if the procedures of HSD treatment in RNAi experiments and the drug treatments were stated in more detail. In addition, the F0 flies should be examined as the control.

      In this current study our main scope was to understand the transgenerational influence of HSD exposure on the progeny. To this aim, we chose to study the physiological and behavioral differences between ND-fed flies vs. HSD-F1 flies (and their progeny on ND). HSD-F1 flies (and their progeny) were not exposed to HSD in their whole life cycle and therefore the physiological and behavioral changes we observed vs. ND-fed flies could be solely attributed to epigenetic modifications transmitted via germline cells from HSD-F0 flies. Therefore ND-fed flies were used as the main control.

      As for HSD-F0 flies, the acute effects of HSD exposure could be more complex. Epigenetic factor was likely involved, as evident in Figure 3-figure supplement 1. C, Figure 3-figure supplement 3. A-B and Figure 4. C. In addition, HSD exposure might also directly affect gene expression and multiple signaling pathways in HSD-F0 flies (see Chen et al., Science China Life Sciences, 2020). Therefore, we did not aim to investigate how HSD exposure affected HSD-F0 flies in this current study. We have added additional discussions in the manuscript for clarification.

      With that saying, we still added more HSD-F0 flies as controls when needed (Figure 2-figure supplement 2. D-G, Figure 3-figure supplement 1. C, Figure 4. C, Figure 5. F, left).

      We have also modified the schematic diagrams and added more detailed explanations in the manuscript, in order to provide a clearer illustration of the experimental procedures (Figure 3. A, Figure 3-figure supplement 1. A, Figure 4. A, B and D). Specifically, we employed two different RNAi approaches. Firstly, we used genetic methods to obtain homozygous Mat-tub-gal4>UAS-gene X RNAi fly lines on chromosomes Ⅱ and Ⅲ for germline-specific knockdown (Figure 3, Figure 3-figure supplement 3). Secondly, we used heterozygous nosNGT-gal4>UAS-gene X RNAi flies for embryo-specific knockdown (Figure 3-figure supplement 1 and 2). Our drug experiments involved both treating the flies and measuring their PER (Figure 4. A-C) and treating the parental flies and measuring the PER of their progeny (Figure 4. D).

      2) The information on the drug treatment period is also missing for imaging experiments (Fig.4C). Moreover, the response curve is very different from those recorded in the same neurons in previous studies. What’s the reason? Please also provide a representative image showing which part of the Gr5a neurons is recorded.

      The experimental procedures of drug treatments were shown in Figure 4. A now. We fed adult flies with specific compounds for five days after eclosion, then measuring the calcium signals of Gr5a+ neurons when flies were fed with sucrose.

      As suggested by the reviewer, we have now conducted calcium imaging experiments more carefully and thoroughly. We have now added the new data into the revised manuscript and the conclusions remained consistent (Figure 4. E). We recorded the calcium signal in the axons of Gr5a+ neurons in the SEZ.

      3) It's unclear whether the decreased Cad expression upon HSD treatment specifically occurred in Gr5a+ neurons or a lot of cells. If the change in gene expression is significant in the qPCR test, it should occur in a large number of cells, most likely including different types of gustatory sensory neurons. If lower cad expression led to lower neural response and thereby lower behavioral response, how to specifically decrease the PER response to sucrose but not to other tastes? -whether HSD-induced desensitization is specific to sucrose in the offspring?

      We agree that Cad expression might decrease in a lot of cells including Gr5a+ neurons in the proboscis. In order to investigate whether taste perception other than sweet sensing was also affected, we conducted PER experiments with fatty acids, which was another type of appetitive taste cues like sugars. Perception of fatty acids is mediated by ionotropic receptors such as ir25a, ir76b, and ir56b (Ahn, et al., eLife, 2017, Brown., et al, eLife, 2021).

      Our results indicate that PER of fatty acid in HSD-F0 and HSD-F1 was not significantly reduced compared to the ND-fed controls (Figure 1-figure supplement 2. E-F). This suggests that the impact of Cad on gustatory sensory neurons might be specific to sweet sensitivity of Gr5a+ neurons.

      4) In Fig.2D, data are sorted for genomic regions showing an up-regulated modification of H3K27me. It's unclear whether similar sorting was performed in panel C. This needs to be clarified.

      The analysis shown in Figure 2C and 2D were linked. As for 2C, we identified genomic loci with enriched H3K27me3, H3K9me3, and H3K27ac peaks, and found that H3K27me3 peaks showed the most robust changes between ND-fed and HSD-F1 flies. Therefore we concentrated on these loci where H3K27me3 modifications were significantly changed between the two groups, and further analyzed their difference. As shown in Figure 2D, within these loci, H3K27ac modifications, which was functionally antagonizing to H3K27me3, were significantly reduced; whereas H3K9me3 signals within these loci remained unchanged. Such results confirmed that ancestral HSD exposure induced robust H3K27me3 modifications in certain genomic loci.

    1. AbstractTransformer-based language models are successfully used to address massive text-related tasks. DNA methylation is an important epigenetic mechanism and its analysis provides valuable insights into gene regulation and biomarker identification. Several deep learning-based methods have been proposed to identify DNA methylation and each seeks to strike a balance between computational effort and accuracy. Here, we introduce MuLan-Methyl, a deep-learning framework for predicting DNA methylation sites, which is based on five popular transformer-based language models. The framework identifies methylation sites for three different types of DNA methylation, namely N6-adenine, N4-cytosine, and 5-hydroxymethylcytosine. Each of the employed language models is adapted to the task using the “pre-train and fine-tune” paradigm. Pre-training is performed on a custom corpus of DNA fragments and taxonomy lineages using self-supervised learning. Fine-tuning aims at predicting the DNA-methylation status of each type. The five models are used to collectively predict the DNA methylation status. We report excellent performance of MuLan-Methyl on a benchmark dataset. Moreover, we argue that the model captures characteristic differences between different species that are relevant for methylation. This work demonstrates that language models can be successfully adapted to applications in biological sequence analysis and that joint utilization of different language models improves model performance. Mulan-Methyl is open source and we provide a web server that implements the approach.Key points

      **Reviewer 2. Jianxin Wang **

      In this manuscript, the authors present MuLan-Methyl, a deep-learning framework for predicting 6mA, 4mC, and 5hmC sites. They use DNA sequence and taxonomic identity as features, and implement five popular transformer-based language models in MuLan-Methyl. MuLan-Methyl is open-sourced, and a web server is also provided for users to access it. Overall, I think the methodology of MuLan-Methyl is clear and innovative, and the experiments seem comprehensive. However, I do have several concerns that I believe should be addressed before the paper is accepted by GigaScience.

      Major 1. One major concern is that, in my opinion, DNA methylation is dynamic. Cytosines in the same position of the DNA sequence may have different methylation status in different samples, different cells, or even in different development stages of a cell. So, how can we predict the methylation status of a site based on only its sequence (and taxonomic identity)? -- The authors should clarify that in what cases, MuLan-Methyl (as well as other methods that use only DNA sequence) can be used to study DNA methylation, in Introduction or Discussion section. -- The authors discuss motifs in Fig. 3, but only for positive samples. How about the motif distribution in the negative samples? Can I understand that this method is actually for discovering motifs (or sequence structures) that are highly correlated with methylation? -- How is the performance of MuLan-Methyl without taxonomic identity? 2. The authors compared MuLan-Methyl against iDNA-ABF and iDNA-ABT, especially on the independent test set (Fig. 2E). I think the authors should clarify that whether they trained the models of the three methods using the same training datasets. If not, the authors should clarify the reason. 3. I'm curious about the computational efficiency of MuLan-Methyl. How many parameters in its model? Does MuLan-Methyl have advantages over other methods in terms of computational efficiency?

      Minor 1. I don't understand why the references were not ordered from 1 in the main text. 2. I suggest that the authors re-organize the Introduction section. There are too many small paragraphs in this section. 3. At the end of Page 2, "The type 4mC type is present in 4 species" should be corrected.

      Re-review:

      The authors have addressed most of my concerns. However, I still have one minor concern about the computational efficiency. The response of the authors is not convincing by only saying "The number of models that MuLan-Methyl need to train and test on is less than the others, thus it has better computational efficiency than other models to some extent". If possible, I strongly suggest that the authors show some data to compare how much time and resources (GPU/CPU/RAM) needed by each method. The authors have addressed most of my concerns. However, I still have one minor concern about the computational efficiency. The response of the authors is not convincing by only saying "The number of models that MuLan-Methyl need to train and test on is less than the others, thus it has better computational efficiency than other models to some extent". If possible, I strongly suggest that the authors show some data to compare how much time and resources (GPU/CPU/RAM) needed by each method.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper falls in a long tradition of studies on the costs of reproduction in birds and its contribution to understanding individual variation in life histories. Unfortunately, the meta-analyses only confirm what we know already, and the simulations based on the outcome of the meta-analysis have shortcomings that prevent the inferences on optimal clutch size, in contrast to the claims made in the paper.

      There was no information that I could find on the effect sizes used in the meta-analyses other than a figure listing the species included. In fact, there is more information on studies that were not included. This made it impossible to evaluate the data-set. This is a serious omission, because it is not uncommon for there to be serious errors in meta-analysis data sets. Moreover, in the long run the main contribution of a meta-analysis is to build a data set that can be included in further studies.

      It is disappointing that two referees comment on data availability, as we supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      The main finding of the meta-analysis of the brood size manipulation studies is that the survival costs of enlarging brood size are modest, as previously reported by Santos & Nakagawa on what I suspect to be mostly the same data set.

      We disagree that the main finding of our paper is the small survival cost of manipulated brood size. The major finding of the paper, in our opinion, is that the effect sizes for experimental and observational studies are in opposite directions, therefore providing the first quantitative evidence to support the influential theoretical framework put forward by van Noordwijk and de Jong (1986), that individuals differ in their optimal clutch size and are constrained to reproducing at this level due to a trade-off with survival. We show that while the manipulation experiments have been widely accepted to be informative, they are not in fact an effective test of whether within-species variation in clutch size is the result of a trade-off between reproduction and survival.

      The comment that we are reporting the same finding as Santos & Nakagawa (2012) is a misrepresentation of both that study and our own. Santos & Nakagawa found an effect of parental effort on survival only in males who had their clutch size increased – but no effect for males who had their clutch size reduced and no survival effect on females for either increasing or reducing parental effort. However, we found an overall reduction in survival for birds who had brood sizes manipulated to make them larger (for both sexes and mixed sex studies combined). In our supplementary information, we demonstrate the overall survival effect of a change in reproductive effort to be close to zero for males, negative (though non-significant) for females and significantly negative for mixed sexes (which are not included in the Santos & Nakagawa study).

      The paper does a very poor job of critically discussing whether we should take this at face value or whether instead there may be short-comings in the general experimental approach. A major reason why survival cost estimates are barely significantly different from zero may well be that parents do not fully adjust their parental effort to the manipulated brood size, either because of time/energy constraints, because it is too costly and therefore not optimal, or because parents do not register increased offspring needs. Whatever the reason, as a consequence, there is usually a strong effect of brood size manipulation on offspring growth and thereby presumably their fitness prospects. In the simulations (Fig.4), the consequences of the survival costs of reproduction for optimal clutch size were investigated without considering brood size manipulation effects on the offspring. Effects on offspring are briefly acknowledged in the discussion, but otherwise ignored. Assuming that the survival costs of reproduction are indeed difficult to discern because the offspring bear the brunt of the increase in brood size, a simulation that ignores the latter effect is unlikely to yield any insight in optimal clutch size. It is not clear therefore what we learn from these calculations.

      The reviewer’s comment is somewhat of a paradox. We take the best studied example of the trade-off between reproductive effort and parental survival, a key theme in life-history and the biology of ageing, and subject this to a meta-analysis. The reviewer suggests we should interpret our finding as if there must be something wrong with the method or studies we included, rather than maybe considering the original hypothesis could be false or inflated in importance. The reviewer’s inclination to question the premise of the data in favor of a held hypothesis we consider not necessarily the best scientific approach here. In many places in our manuscript do we question and address issues in the underlying data and interpretation (L101-105, L149-150, 182-185 and L229-233). Moreover, we make it clear that we focus on the trade-off between current reproductive effort and subsequent parental survival and we are aware that other trade-offs could counter-balance or explain our findings, discussed on L189-191 & L246-253. Note that it is also problematic, when you do not find the expected response, to search for an alternative that has not been measured. In the case here, with trade-offs, there are endless possiblilities of where a trade-off might be incurred between traits. We purposfully focus on the one well-studied and theorised trade-off. We clearly acknowledge though that when all possible trade-offs are taken into account a trade-off on the fitness level can occur and cite two famous studies (Daan et al., 1990 and Verhulst & Tinbergen 1991) that have done just that (L250-253).

      So whilst, we agree with the reviewer that the offspring may incur costs themselves, rather than costs being incurred by the parents, the aim of our study was to test for a generalised trend across species in the survival costs of reproductive effort. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest.

      There are other reasons why brood size manipulations may not reveal the costs of reproduction animals would incur when opting for a larger brood size than they produced spontaneously themselves. Firstly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Secondly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      First, our results did show a survival cost of reproduction for brood manipulations. We agree that there could be longer-term costs, and so our estimate of the survival cost for manipulated birds is likely to be an underestimate, meaning that our interpretation still holds – the cost to reproduce prevents individuals from laying beyond their optimal level. Note, however, that much theory is build on the immediate costs of reproduction and as such these costs are likely overinterpreted.

      We agree with the reviewer that lifetime manipulations could be even more informative than single-year manipulations. Unfortunately, there are currently too few studies available to be able to draw generalisable conclusions across species for lifetime manipulations. This is, however, the reason we used lifetime change in clutch size in our fitness projections, which the reviewer seems to have missed – please see methods line 360-362, where we explicitly state that this is lifetime enlargement. Of course such interpretations do not include an accumulation of costs that is greater than the annual cost, but currently there is no clear evidence that such an assumption is valid. Such a conclusion can also not be drawn from the study on jackdaws by Boonekamp et al (2014) as the treatments were life-long and, therefore, cannot separate annual from accrued (multiplicative) costs that are more than the sum of annual costs incurred.

      Details of how the analyses were carried out were opaque in places, but as I understood the analysis of the brood size manipulation studies, manipulation was coded as a covariate, with negative values for brood size reductions and positive values for brood size enlargements (and then variably scaled or not to control brood or clutch size). This approach implicitly assumes that the trade-off between current brood size (manipulation) and parental survival is linear, which contrasts with the general expectation that this trade-off is not linear. This assumption reduces the value of the analysis, and contrasts with the approach of Santos & Nakagawa.

      We thank the reviewer for highlighting a lack of clarity in places in our methods. We will add additional detail to this section in our revised manuscript.

      For clarity in our response, each effect size was extracted by performing a logistic regression with survival as a binary response variable and clutch size was the absolute value of offspring in the nest (i.e., for a bird who laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). The clutch size was also standardised and, separately, expressed as a proportion of the species mean.

      We disagree that our approach reduces the value of our analysis. First, our approach allows a direct comparison between experimental and observational studies, which is the novelty of our study. Our approach does differ from Santos & Nakagawa but we disagree that it contrasts. Our approach allows us to take into consideration the severity of the change in clutch size, which Santos & Nakagawa do not. Therefore, we do not agree that our approach is worse at accounting for non-linearity of trade-offs than the approach used by Santos & Nakagawa.

      Our analysis, alongside a plethora of other ecological studies, does assume that the response to our predictor variable is linear. However, it is common knowledge that there are very few (if any) truly linear relationships. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets there is not a range of chicks added for which a non-linear relationship could be estimated. The question also remains of what the shape of this non-linear relationship should be and is hard to determine a priori. We will address non-linear effects in our revised manuscript.

      The observational study selection is not complete and apparently no attempt was made to make it complete. This is a missed opportunity - it would be interesting to learn more about interspecific variation in the association between natural variation in clutch size and parental survival.

      We clearly state in our manuscript that we deliberately made a tailored selection of studies that matched the manipulation studies (L279-282). We paired species extracted for observational studies with those extracted in experimental studies to facilitate a direct comparison between observational and experimental studies, and to ensure that the respective datasets were comparable. The reviewer’s focus in this review seems to be solely on the experimental dataset. This comment dismisses the observational component of our analysis and thereby fails to acknowledge the question being addressed in this study.

      Reviewer #2 (Public Review):

      I have read with great interest the manuscript entitled "The optimal clutch size revisited: separating individual quality from the costs of reproduction" by LA Winder and colleagues. The paper consists in a meta-analysis comparing survival rates from studies providing clutch sizes of species that are unmanipulated and from studies where the clutch sizes are manipulated, in order to better understand the effects of differences in individual quality and of the costs of reproduction. I find the idea of the manuscript very interesting. However, I am not sure the methodology used allows to reach the conclusions provided by the authors (mainly that there is no cost of reproduction, and that the entire variation in clutch size among individuals of a population is driven by "individual quality").

      We would like to highlight that we do not conclude that there is no cost of reproduction. Please see lines 258–260, where we state that our lack of evidence for trade-offs driving within-species variation in clutch size does not necessarily mean the costs of reproduction are non-existent. We conclude that individuals are constrained to their optima by the survival cost of reproduction. It is also an over-statement of our conclusion to say that we believe that variation in clutch size is only driven by quality. Our results show that unmanipulated birds who have larger clutch sizes also live longer, and we suggest this is evidence that some individuals are “better” than others, but we do not say, nor imply, that no other factors affect variation in clutch size.

      I write that I am not sure, because in its current form, the manuscript does not contain a single equation, making it impossible to assess. It would need at least a set of mathematical descriptions for the statistical analysis and for the mechanistic model that the authors infer from it.

      We appreciate this comment, but this is the first time we have been asked to put equations in a manuscript rather than explain them in terms that are accessible to a wider audience. Note however that our meta-analysis is standard and based on logistic regression and standard meta-analytic practices. We do not think we need to repeat such equations and we cite the relevant data. For the simulation, we simply simulated the resulting effects and this is not something that we feel is captured more accurately in equations rather than in text and the associated graphs. We of course supplied our code for this along with our manuscript (https://doi.org/10.5061/dryad.q83bk3jnk), though as we mentioned above, we believe this was not shared with the reviewers despite us making this available for the review process. We therefore understand the reviewer feels the simulations were not explained thoroughly. We will revise our text to see if we can add additional explanation where relevant in our revision.

      The texts mixes concepts of individual vs population statistics, of within individual vs among-individuals measures, of allocation trade-offs and fitness trade-offs, etc ....which means it would also require a glossary of the definitions the authors use for these various terms, in order to be evaluated.

      We would like to thank the reviewer for highlighting this lack of clarity in our text. We will simplify the terminology and define terms in our revised manuscript.

      This problem is emphasised by the following sentence to be found in the discussion "The effect of birds having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation". The "effect" is defined as the survival rate (see Fig 1). While it is relatively easy to intuitively understand what the "effect" is for the unmanipulated studies: the sensitivity of survival to clutch size at the population level, this should be mentioned and detailed in a formula. Moreover, the concept of effect size is not at all obvious for the manipulated ones (effect of the manipulation? or survival rate whatever the manipulation (then how could it measure a trade-off ?)? at the population level? at the individual level ?) despite a whole appendix dedicated to it. This absolutely needs to be described properly in the manuscript.

      We would like to thank the reviewer for bringing to our attention the lack of clarity on the details of our methodology. We will make this more clear in our revised manuscript.

      For clarity, the effect size for both manipulated and unmanipulated nests was survival, given the brood size raised. We performed a logistic regression with survival as a binary response variable (i.e., number of individuals that survived and number of individuals that died after each breeding season), and clutch size was the absolute value of offspring in the nest (i.e., for a bird who laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). This allows for direct comparison of the effect size (survival given clutch size raised) between manipulated and unmanipulated birds.

      Despite the lack of information about the underlying mechanistic model tested and the statistical model used, my impression is still that the interpretation in the introduction and discussion is not granted by the outputs of the figures and tables. Let's use a model similar to that of (van Noordwijk and de Jong, 1986): imagine that the mechanism at the population level is

      a.c_(i,q)+b.s_(i,q)=E_q

      Where c_(i,q) are s_(i,q) are respectively the clutch size for individual i which is of quality q, and E_q is the level of "energy" that an individual of quality q has available during the given time-step (and a and b are constants turning the clutch size and survival rate into energy cost of reproduction and energy cost of survival, and there are both quite "high" so that an extra egg (c_(i,q) is increased by 1) at the current time-step, decreases s_(i,q) markedly (E_q is independent of the number of eggs produced), that is, we have strong individual costs of reproduction). Imagine now that the variance of c_(i,q) (when the population is not manipulated) among individuals of the same quality group, is very small (and therefore the variance of s_(i,q) is very small also) and that the expectation of both are proportional to E_q. Then, in the unmanipulated population, the variance in clutch size is mainly due to the variance in quality. And therefore, the larger the clutch size c_(i,q) the higher E_q, and the higher the survival s_(i,q).

      In the manipulated populations however, because of the large a and b, an artificial increase in clutch size, for a given E_q, will lead to a lower survival s_(i,q). And the "effect size" at the population level may vary according to a,b and the variances mentioned above. In other words, the costs of reproduction may be strong, but be hidden by the data, when there is variance in quality; however there are actually strong costs of reproduction (so strong actually that they are deterministic and that the probability to survive is a direct function of the number of eggs produced)

      We would like to thank the reviewer for these comments. Please note that our simulations only take the experimental effect of brood size on parental survival into account. Our model does not incorporate quality effects. The reviewer is right that the relationship between quality and the effects exposed by manipulating brood size can take many forms and this is a very interesting topic, but not one we aimed to tackle in our manuscript. In terms of quality we make two points: 1) overall quality effects connecting reproduction and parental survival are present 2) these effects are opposite in direction to the effects when reproduction is manipulated and similar in magnitude. We do not go further than that in interpreting our results. The reviewer is right however that we do suggest and repeat suggestions by others that quality can also mask the trade-off in some individuals or circumstances (L63-65, L85-88 & L237-240), but we do not quantify this as this is dependent on the unknown relationships between quality and the response to the manipulation. A focussed set of experiments in that context would be interesting and there is some data that could get at this, i.e. the relationship between produced clutch size and the relative effect of the manipulation. Such information is however not available for all studies and although we explored also analyzing this, currently this is not possible to do with sufficient confidence. We will include this rationale in our revision.

      Moreover, it seems to me that the costs of reproduction are a concept closely related to generation time. Looking beyond the individual allocative (and other individual components of the trade-off) cost of reproduction and towards a populational negative relationship between survival and reproduction, we have to consider the intra-population slow fast continuum (some types of individuals survive more and reproduce less (are slower) than other (which are faster)). This continuum is associated with a metric: the generation time. Some individuals will produce more eggs and survive less in a given time-period because this time-period corresponds to a higher ratio of their generation time (Gaillard and Yoccoz, 2003; Gaillard et al., 2005). It seems therefore important to me, to control for generation time and in general to account for the time-step used for each population studied when analysing costs of reproduction. The data used in this manuscript is not just clutch size and survival rates, but clutch size per year (or another time step) and annual (or other) survival rates.

      The reviewer is right that this is interesting. There has been unexplained difference in temperate (seasonal) and tropical reproduction strategies. Most of our data come from seasonal breeders however. Although there is some variation in second brooding and such often these species only produce one brood. We do agree that a wider consideration here is relevant, but we are not trying to explain all of life-history in our paper. It is clearly the case that other factors will operate and the opportunity for trade-offs will vary among species according to their respective life histories. However, our study focuses on the two most fundamental components of fitness – longevity and reproduction – to test a major hypothesis in the field, and we uncover new relationships that contrast with previous influential studies, and cast doubt on previous conclusions. We question the assumed trade-off between reproduction and annual survival. We show quality is important and that the effect we find in experimental studies, is so small that it can only explain between-species patterns but is unlikely to be the selective force that constrains reproduction within-species. We do agree that there is a lot more work that can be done in this area. We hope we contribute to this, by questioning this central trade-off. We will try and incorporate some of these suggestions in the revision where possible.

      Finally, it is important to relate any study of the costs of reproduction in a context of individual heterogeneity (in quality for instance), to the general problem of the detection of effects of individual differences on survival (see, e.g., Fay et al., 2021). Without an understanding of the very particular statistical behaviour of survival, associated to an event that by definition occurs only once per life history trajectory (by contrast to many other traits, even demographic, where the corresponding event (production of eggs for reproduction, for example) can be measured several times for a given individual during its life history trajectory).

      Thank you for raising this point. The reviewer is right that heterogeneity can dampen or augment selection. Note that by estimating the effect of quality here we give an example of how heterogeneity can possibly do exactly this. We thank the reviewer for raising that we should possibly link this to wider effects of heterogeneity and we aim to do so in the revision.

      References:

      Fay, R. et al. (2021) 'Quantifying fixed individual heterogeneity in demographic parameters: Performance of correlated random effects for Bernoulli variables', Methods in Ecology and Evolution, 2021(August), pp. 1-14. doi: 10.1111/2041-210x.13728.

      Gaillard, J.-M. et al. (2005) 'Generation time: a reliable metric to measure life-history variation among mammalian populations.', The American naturalist, 166(1), pp. 119-123; discussion 124-128. doi: 10.1086/430330.

      Gaillard, J.-M. and Yoccoz, N. G. (2003) 'Temporal Variation in Survival of Mammals: a Case of Environmental Canalization?', Ecology, 84(12), pp. 3294-3306. doi: 10.1890/02-0409.

      van Noordwijk, A. J. and de Jong, G. (1986) 'Acquisition and Allocation of Resources: Their Influence on Variation in Life History Tactics', American Naturalist, p. 137. doi: 10.1086/284547.

      Reviewer #3 (Public Review):

      The authors present here a comparative meta-analysis analysis designed to detect evidence for a reproduction/ survival trade-off, central to expectations from life history theory. They present variation in clutch size within species as an observation in conflict with expectations of optimisation of clutch size and suggest that this may be accounted for from weak selection on clutch size. The results of their analyses support this explanation - they found little evidence of a reproduction - survival trade-off across birds. They extrapolated from this result to show in a mathematical model that the fitness consequences of enlarged clutch sizes would only be expected to have a significant effect on fitness in extreme cases, outside of normal species' clutch size ranges. Given the centrality of the reproduction-survival trade-off, the authors suggest that this result should encourage us to take a more cautious approach to applying concepts the trade-off in life history theory and optimisation in behavioural ecology more generally. While many of the findings are interesting, I don't think the argument for a major re-think of life history theory and the role of trade-offs in fitness maximisation is justified.

      The interest of the paper, for me, comes from highlighting the complexities of the link between clutch size and fitness, and the challenges facing biologists who want to detect evidence for life history trade-offs. Their results highlight apparently contradictory results from observational and experimental studies on the reproduction-survival trade-off and show that species with smaller clutch sizes are under stronger selection to limit clutch size.

      Unfortunately, the authors interpret the failure to detect a life history trade-off as evidence that there isn't one. The construction of a mathematical model based on this interpretation serves to give this possible conclusion perhaps more weight than is merited on the basis of the results, of this necessarily quite simple, meta-analysis. There are several potential complicating factors that could explain the lack of detection of a trade-off in these studies, which are mentioned and dismissed as unimportant (lines 248-250) without any helpful, rigorous discussion. I list below just a selection of complexities which perhaps deserve more careful consideration by the authors to help readers understand the implications of their results:

      We would like to thank the reviewer for their thoughtful response and summary of the findings we also agree are central to our study. The reviewer also highlights areas where our manuscript could benefit from a deeper discussion and we will add detail to our discussion in our revised manuscript.

      We would like to highlight that we do not interpret the failure to detect a trade-off as evidence that there isn’t one. First, and importantly, we do find a trade-off but show this is only incurred when individuals lay beyond their optimal level. Secondly, we also state on lines 258-260 that the lack of evidence to support trade-offs being strong enough to drive variation in clutch size does not necessarily mean there are no costs of reproduction.

      The statement that we have constructed a mathematical model based on the interpretation that we have not found a trade-off is, again, factually incorrect. We ran these simulations because the opposite is true – we did find a trade-off. There is a significant effect of clutch size when manipulated on annual parental survival. To appreciate whether this effect alone can explain why reproduction is constrained, we ran the simulations. From these simulations we find that this effect size is too small to explain the constraint so something else must be going on and we do spend a considerable amount of text discussing the possible explanations (L182-194). Note the possibly most parsimonious conclusion here is that costs of reproduction are not there so we also give that explanation some thought (L201-205 and L247-253).

      We are disappointed by the suggestion that we have dismissed complicating factors which could prevent detection of a trade-off, as this was not our intention. We were aiming to highlight that what we have demonstrated to be an apparent trade-off can be explained through other mechanisms, and that the trade-off between clutch size and survival is not as strong in driving within-species variation in clutch size as previously assumed. We will add further discussion to our revised manuscript to make this clear and give readers a better understanding of the complexity of factors associated with life-history theory. Although we do feel we have addressed this (L248-255).

      • Reproductive output is optimised for lifetime reproductive success and so the consequences of being pushed off the optimum for one breeding attempt are not necessarily detectable in survival but in future reproductive success (and, therefore, lifetime reproductive success).

      We agree this is a valid point, which is mentioned in our manuscript in terms of alternative stages where the costs of reproduction might be manifested (L248-250). We would also like to highlight that in our simulations, the change in clutch size (and subsequent survival cost) was assumed for the lifetime of the individual, for this very reason.

      • The analyses include some species that hatch broods simultaneously and some that hatch sequentially (although this information is not explicitly provided (see below)). This is potentially relevant because species which have been favoured by selection to set up a size asymmetry among their broods often don't even try to raise their whole broods but only feed the biggest chicks until they are sated; any added chicks face a high probability of starvation. The first point this observation raises is that the expectation of more chicks= more cost, doesn't hold for all species. The second more general point is that the very existence of the sequential hatching strategy to produce size asymmetry in a brood is very difficult to explain if you reject the notion of a trade-off.

      We agree with the reviewer that the costs of reproduction can be absorbed by the offspring themselves, and may not be equal across offspring (we also highlight this at L249 in the manuscript). However, we disagree that for some species the addition of more chicks does not equate to an increase in cost, though we do accept this might be less for some species. This is, however, difficult to incorporate into a sensible model as the impacts will vary among species and some species do also exhibit catch-up growth. So without a priori knowledge on this we kept our model simple. To test whether the effect on parental survival (often assumed to be a strong cost) can explain the constraint on reproductive effort, and we conclude it does not.

      We would also like to make clear that we are not rejecting the notion of a trade-off. Our study shows evidence that a trade-off between survival and reproductive effort likely does not drive within-species variation in clutch size. We do explicitly say this throughout our manuscript, and also provide suggestions of other areas where a trade-off may exist (L246-250). The point of our study is not whether trade-offs exist or not, it is whether there is a generalisable across-species trend for a trade-off between reproductive effort and survival – the most fundamental trade-off in our field but for which there is a lack of conclusive evidence within species.

      • For your standard, pair-breeding passerine, there is an expectation that costs of raising chicks will increase linearly with clutch size. Each chick requires X feeding visits to reach the required fledge weight. But this is not the case for species which lay precocious chicks which are relatively independent and able to feed themselves straight after hatching - so again the relationship of care and survival is unlikely to be detectable by looking at the effect of clutch size but again, it doesn't mean there isn't a trade-off between breeding and survival.

      Precocial birds still provide a level of parental care, such as protection from predators. Though we agree that the level of parental care in provisioning food (and in some cases in all parental care given) is lower in precocial than altricial birds, this would only make our reported effect size for manipulated birds to be an underestimate. Again, we would like to draw the reviewer’s attention to the fact we did detect a trade-off in manipulated birds and we do not suggest that trade-offs do not exist. The argument the reviewer suggests here does not hold for unmanipulated birds, as we found that birds that naturally lay larger clutch sizes have higher survival.

      • The costs of raising a brood to adulthood for your standard pair-breeding passerine is bound to be extreme, simply by dint of the energy expenditure required. In fact, it was shown that the basal metabolic rate of breeding passerines was at the very edge of what is physiologically possible, the human equivalent being cycling the Tour de France (Nagy et al. 1990). If birds are at the very edge of what is physiologically possible, is it likely that clutch size is under weak selection?

      If birds are at the very edge of what is physiologically possible, then indeed it would necessarily follow that if they increase the resource allocated in one area then expenditure in another area must be reduced. In many studies however, the overall brood mass is increased when chicks are added and cared for in an experimental setting, suggesting that birds are not operating at their limit all the time. Our simulations show that if individuals increase their clutch size, the survival cost of reproduction counterbalances the fitness gained by increasing clutch size and so there is no overall fitness gain to producing more offspring. Therefore, selection on clutch size is constrained to the within-species level. We do not say in our manuscript that clutch size is under weak selection – we only ask why variation in clutch size is maintained if selection always favours high-producing birds.

      • Variation in clutch size is presented by the authors as inconsistent with the assumption that birds are under selection to lay the Lack clutch. Of course, this is absurd and makes me think that I have misunderstood the authors' intended point here. At any rate, the paper would benefit from more clarity about how variable clutch size has to be before it becomes a problem for optimality in the authors' view (lines 84-85; line 246). See Perrins (1965) for an exquisite example of how beautifully great tits optimise clutch size on average, despite laying between 5-12 eggs.

      We woud like to thank the reviewer for highlighting that our manuscript may be misleading in places, however, we are unsure which part of our conclusions the author is referring to here.The question we pose is “why all birds don’t lay at the population optimum?”, and is central to the decades-long field of life-history theory. Why is variation maintained at such a level? As the reviewer outlines it ranges massively with some birds laying half of what other birds lay.

    1. How willing are we to acknowledge that our institutions, both their structures and cultures, have a history of, and may still in many ways be unsupportive and/or hostile to our students and their communities?

      I completely agree with the following quote, I feel like this relates to the education system a lot for 60+ years when POC started to receive education from schools. I believe there have been positive changes resulting in them attending school. But I also believe that the education system in the US negatively continues to fail them since POC are the minorities in America and before were the poorest people on the planet. And still to his day the education system fails to help under privileged students succeed in these social institutions like in schools. Especially because I feel like the education system has not been changed for years and is outdated and only very recently there has been few changes to change it. But I think we need to reform the policies to help under privileged students attending schools in America by first acknowledging insutions failed POC. And secondly, reform the information. And lastly, create more welcome in groups at schools to help them succeed socially and many other things as well to help students in the future.

    1. If I am understanding this right - I think the potential dilemma that arises from professional versus local archaeology is interesting. Local archeology efforts could provide insight into the past that would've gone unresearched otherwise, but with lower budgets and potentially greater mistakes (due to it not being 'professional quality' (?)) could harm the items in the dig site. Professional archaeology affords research in key places, motivated by economics, politics, or culture, and allows for the use of advanced techniques such as carbon dating. Unfortunately, this may remove the discovery that many of the locals of the area would've likely enjoyed performing (since it is their roots). How are we to weigh preservation, quality, and ethics together to form the idea of 'just' archeology?

    Annotators

    1. Back in 1945, there was this guy, Vannevar Bush. He was working for the US government, and one of the ideas that he put forth was, 00:01:35 "Wow, humans are creating so much information, and we can't keep track of all the books that we've read or the connections between important ideas." And he had this idea called the "memex," where you could put together a personal library of all of the books and articles that you have access to. And that idea of connecting sources captured people's imaginations.
      • for: memex, Vannevar Bush, Indyweb, Ted Nelson
    1. The Science Behind Hydrogen Rich Water Machine

      In the health and wellness world, a fascinating trend has emerged with the rise of hydrogen infused water machine. These innovative devices promise to deliver a refreshing beverage beyond ordinary hydration – hydrogen-rich water. Packed with potential health benefits, the science behind these machines is captivating and sheds new light on how we think about water consumption and its impact on our well-being.

      Hydrogen: The Unsung Hero Of Molecules

      Before delving into the science of hydrogen-rich water machines, it's essential to understand the pivotal role of hydrogen itself. Hydrogen is the lightest and simplest element on the periodic table, consisting of a single proton and an electron. While hydrogen is generally known for its explosive nature, it has recently garnered attention for its potential health benefits when dissolved in water.

      The Power Of Hydrogen-Infused Water

      Hydrogen-infused water, often called hydrogen-rich water, is created when molecular hydrogen gas (H2) is dissolved into plain water. This process typically involves using advanced technologies found in hydrogen-rich water machines. The resulting beverage is touted for its potential antioxidant properties, which could contribute to various health improvements.

      Antioxidant Action: Hydrogen's Hidden Potential

      Antioxidants are essential for neutralizing dangerous chemicals known as free radicals, which may damage cells and contribute to a variety of health problems such as chronic illnesses and ageing. Molecular hydrogen is thought to have antioxidant characteristics that are more effective than well-known antioxidants such as vitamins C and E.

      Hydrogen's unique antioxidant potential lies in its ability to easily penetrate cell membranes and access cellular compartments, including the nucleus and mitochondria. This attribute gives hydrogen an edge in protecting cellular components from oxidative stress, potentially reducing the risk of oxidative damage.

      The Mechanism: How Hydrogen Works Its Magic

      The exact mechanism behind hydrogen's antioxidant effects is still an area of ongoing research, but several theories have been proposed. One prominent theory suggests that hydrogen is a selective scavenger of harmful free radicals, targeting the most reactive and damaging ones without affecting beneficial molecules like oxygen or nitric oxide.

      Another theory is that hydrogen has the power to modify signalling pathways within cells. By altering these pathways, hydrogen may elicit preventive responses that boost the body's natural defence systems against oxidative stress and inflammation.

      Hydrogen-Rich Water Machines: The Technology

      Hydrogen-rich water machines are designed to harness the power of molecular hydrogen by infusing it into plain drinking water. These devices commonly use electrolysis, which involves sending an electric current through water to divide it into hydrogen and oxygen gases. The hydrogen gas is subsequently dissolved in water, yielding a beverage high in this beneficial chemical.

      These machines are equipped with advanced membranes that allow only hydrogen molecules to pass through while preventing the escape of potentially harmful byproducts like ozone. This ensures the purity and safety of the resulting hydrogen-infused water.

      Potential Health Benefits

      While research on the health benefits of hydrogen-rich water is still in its infancy, preliminary studies have shown promising results. Some of the potential benefits include the following:

      Antioxidant Defense: Hydrogen-rich water's antioxidant properties could help reduce oxidative stress and associated health risks. Anti-Inflammatory Effects: Hydrogen may have anti-inflammatory effects that could benefit conditions like arthritis and other inflammatory disorders. Cellular Health: Hydrogen might contribute to overall cellular health and function by protecting cellular components. Exercise Performance: Some research suggests that hydrogen-rich water might enhance exercise performance and reduce muscle fatigue. Conclusion: A Glimpse Into The Future Of Hydration

      Hydrogen-rich water machines are ushering in a new era of hydration, where molecular hydrogen's benefits are harnessed to enhance our well-being potentially. While more research is needed to understand the extent of these benefits fully, the early findings are exciting and have sparked interest among health-conscious individuals.

      As technology advances, we can anticipate more refined hydrogen-infused water machines and a deeper understanding of how molecular hydrogen interacts with our bodies. Whether you're an early adopter or a cautious observer, the science behind these machines invites us to explore the intriguing potential of hydrogen-infused water and its impact on our health.

    1. Reviewer #1 (Public Review):

      Summary:<br /> This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though I have substantive concerns about how this analysis was performed and as such will not summarize the results. Broadly, the behavioural and univariate findings are consistent with the idea that memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths:<br /> The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.

      Weaknesses:<br /> As noted above, the pattern similarity analysis for both item and category-level reinstatement was performed in a way that is not interpretable given concerns about temporal autocorrelation within the scanning run. Below, I focus my review on this analytic issue, though I also outline additional concerns.

      1. The pattern similarity analyses were not done correctly, rendering the results uninterpretable (assuming my understanding of the authors' approach is correct).

      a. First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within the scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, but I don't believe this is possible unfortunately given the authors' design; I believe the target (presumably reinstated) scene only appears once during scanning, so there is no separate neural pattern during the presentation of this picture that they can use. For these reasons, any evidence for "significant scene-specific reinstatement" and the like is completely uninterpretable and would need to be removed from the paper.

      b. From a theoretical standpoint, I believe the way this analysis was performed considering the fixation and the immediately following scene also means that the differences between recent and remote could have to do with either the reactivation (processes happening during the fixation, presumably) or differences in the processing of the stimulus itself (happening during the scene presentation). For example, people might be more engaged with the more novel scenes (recent) and therefore process those scenes more; such a difference would be interpreted in this analysis as having to do with reinstatement, but in fact could be just related to the differential scene processing/recognition, etc. It would be important when comparing scene-specific neural patterns as templates for reinstatement across conditions that, at the time of scene presentation itself, the two conditions are equal (e.g., no difference in familiarity and so on); otherwise, we do not know which trial period (and therefore which underlying process) is driving the differences.

      c. For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within the run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). With this restriction, it may or may not be possible to perform this analysis, depending upon how the same-category scenes are distributed across runs. However, there are other issues with this analysis, as well. (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. The authors do not motivate the reason for this switch. Please provide reasoning as to why fixation-fixation is more appropriate than fixation-scene similarity for category-level reinstatement, particularly given the opposite was used for item-level reinstatement. Even if the analyses were done properly, it would remain hard to compare them given this difference in approach. (3) I believe the fixation cross with itself is included in the "within category" score. Is this not a single neural pattern correlated with itself, which will yield maximal similarity (pearson r=1) or minimal dissimilarity (1-pearson r=0)? Including these comparisons in the averages for the within-category score will inflate the difference between the "within-category" and "between-category" comparisons. These (e.g., forest1-forest1) should not be included in the within-category comparisons considered; rather, they should be excluded, so the fixations are always different but sometimes the comparisons are two retrievals of the same scene type (forest1-forest2), and other times different scene types (forest1-field1). (4) It is troubling that the results from the category reinstatement metric do not seem to conceptually align with past work; for example, a lot of work has shown category-level reinstatement in adults. Here the authors do not show any category-level reinstatement in adults (yet they do in children), which generally seems extremely unexpected given past work and I would guess has to do with the operationalization of the metric.

      2. I did not see any compelling statistical evidence for the claim of less robust consolidation in children. Specifically in terms of the behavioural results of retention of the remote items at 1 vs 14 days, shown in Figure 2B, the authors conclude that memory consolidation is less robust in children (line 246). Yet they do not report statistical evidence for this point, as there was no interaction of this effect with the age group. Children had worse memory than adults overall (in terms of a main effect - i.e. across recent and remote items). If it were consolidation-specific, one would expect that the age differences are bigger for the remote items, and perhaps even most exaggerated for the 14-day-old memories. Yet this does not appear to be the case based on the data the authors report. Therefore, the behavioural differences in retention do not seem to be consolidation specific, and therefore might have more to do with differences in encoding fidelity or retrieval processes more generally across the groups. This should be taken into account when interpreting the findings.

      3. Please clarify which analyses were restricted to correct retrievals only. The univariate analyses states that correct and incorrect trials were modelled separately, but does not say which were considered in the main contrast (I assume correct only?). The item specific reinstatement analysis states that only correct trials were considered, but the category-level reinstatement analysis does not say. Please include this detail.

      4. To what extent could performance differences be impacting the differences observed across age groups? I think (see prior comment) that the analyses were probably limited to correct trials, which is helpful, but still yields pretty big differences across groups in terms of the amount of data going into each analysis. In general, children showed more attenuated neural effects (e.g., recent/remote or session effects); could this be explained by their weaker memory? Specifically, if only correct trials are considered that means that fewer trials would be going into the analysis for kids, especially for the 14-day remote memories, and perhaps pushing the remove > recent difference for this condition towards 0. The authors might be able to address this analytically; for example, does the remote > recent difference in the univariate data at day 14 correlate with day 14 memory?

      5. Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). This difference from zero or lack thereof seems important to the message - is that correct? If so, can the authors incorporate descriptions of these findings?

      6. Please provide more details about the choices available for locations in the 3AFC task. (1) Were they different each time, or always the same? If they are always the same, could this be a motor or stimulus/response learning task? (2) Do the options in the 3AFC always come from the same area - in which case the participant is given a clue as to the gist of the location/memory? Or are they sometimes randomly scattered across the image (in which case gist memory, like at a delay, would be sufficient for picking the right option)? Please clarify these points and discuss the logic/impact of these choices on the interpretation of the results.

      7. Often p values are provided but test statistics, effect sizes, etc. are not - please include this information. It is at times hard to tell whether the authors are reporting main effects, interactions, pairwise comparisons, etc.

      8. There are not enough methodological details in the main paper to make sense of the results. For example, it is not clear from reading the text that there are new object-location pairs learned each day.

      9. The retrieval task does not seem to require retrieval of the scene itself, and as such it would be helpful for the authors to both explain their reasoning for this task to measure reinstatement. Strictly speaking, participants could just remember the location of the object on the screen. Was it verified that children and adults were recalling the actual scene rather than just the location (e.g. via self-report)? It's possible that there may be developmental differences in the tendency to reinstate the scene depending on e.g., their strategy.

      10. In general I found the Introduction a bit difficult to follow. Below are a few specific questions I had.

      a. At points findings are presented but the broader picture or take-home point is not expressed directly. For example, lines 112-127, these findings can all be conceptualized within many theories of consolidation, and yet those overarching frameworks are not directly discussed (e.g., that memory traces go from being more reliant on the hippocampus to more on the neocortex). Making these connections directly would likely be helpful for many readers.

      b. Lines 143-153 - The comparison of the Tompary & Davachi (2017) paper with the Oedekoven et al. (2017) reads like the two analyses are directly comparable, but the authors were looking at different things. The Tompary paper is looking at organization (not reinstatement); while the Oedekoven et al. paper is measuring reinstatement (not organization). The authors should clarify how to reconcile these findings.

      c. Line 195-6: I was confused by the prediction of "stable involvement of HC over time" given the work reviewed in the Introduction that HC contribution to memory tends to decrease with consolidation. Please clarify or rephrase.

      d. Lines 200-202: I was a bit confused about this prediction. Firstly, please clarify whether immediate reinstatement has been characterized in this way for kids versus adults. Secondly, don't adults retain gist more over long delays (with specific information getting lost), at least behaviourally? This prediction seems to go against that; please clarify.

    1. Author Response

      We thank the reviewers for their work and their thoughtfulness. However, it seems to us that much (but not all) of the critique reflects a misunderstanding of the goals and methods of computational modeling. Details are below. We are grateful for the opportunity to include our views about this in the context of our replies to the Public Critiques of our paper. The comments of the reviewers were very helpful in allowing us to see what might not be clear to our readers.

      eLife assessment

      This useful modeling study explores how the biophysical properties of interneuron subtypes in the basolateral amygdala enable them to produce nested oscillations whose interactions facilitate functions such as spike-timing-dependent plasticity. The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered. This work will be of interest to investigators studying circuit mechanisms of fear conditioning as well as rhythms in the basolateral amygdala.

      Most of our comments below are intended to rebut the sentence: “The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered”. Details are below in the answer to reviewers.

      We believe this work will be interesting to investigators interested in dynamics associated with plasticity, which goes beyond fear learning. It will also be of interest because of its emphasis on the interactions of multiple kinds of interneurons that produce dynamics used in plasticity, in the cortex (which has similar interneurons) as well as BLA.

      We note that the model has sufficiently detailed physiology to make many predictions that can be tested experimentally. In the revision, we will be more explicit about this.

      We thank Reviewer #1 for stressing our work's important contribution to providing concrete hypotheses that can be tested in vivo and highlighting the importance of examining in the future the synergistic role of the interneurons in the BLA in fear learning in the BLA. The weaknesses reported by the Reviewer concern deviations of the model compared to the experimental literature. We describe below why we think those differences are minor in the context of the aims of our model. Specifically,

      1) Some connections among neurons in the BLA reported by (Krabbe et al., 2019) have not been taken into account in the model. Some connections between cell types were excluded without adequate justification (e.g. SOM+ to PV+).

      In order to constrain our model, we focused on what is reported in (Krabbe et al., 2019) in terms of functional connectivity instead of structural connectivity. Thus, we included only those connections for which there was strong functional connectivity. For example, the SOM+ to PV+ connection is shown to be small (Supp. Fig. 4, panel t). We also omitted PV+ to SOM+, PV+ to VIP+, SOM+ to VIP+, VIP+ to excitatory projection neurons; all of these are shown in (Krabbe et al. 2019, Fig. 3 (panel l), and Supp. Fig. 4 (panels m,t)) to have weak functional connectivity, at least in the context of fear conditioning. See below for comments on modeling strategies. We will explain this better in our revision.

      2) The construction of the afferent drive to the network does not reflect the stimulus presentations that are given in fear conditioning tasks. For instance, the authors only used a single training trial, the conditioning stimulus was tonic instead of pulsed, the unconditioned stimulus duration was artificially extended in time, and its delivery overlapped with the neutral stimulus, instead of following its offset. These deviations undercut the applicability of their findings.

      Regarding the use of a single long presentation of US rather than multiple presentations (i.e., multiple trials): in early versions of this paper, we did indeed use multiple presentations. We were told by experimental colleagues that the learning could be achieved in a single trial. We note that, if there are multiple presentations in our modeling, nothing changes; once the association between CS and US is learned, the conductance of the synapse is stable. Also, our model does not need a long period of US if there are multiple presentations. This point will be made clearer in our revision.

      We agree that, in order to implement the fear conditioning paradigm in our in-silico network, we made several assumptions about the nature of the CS and US inputs affecting the neurons in the BLA and the duration of these inputs. A Poisson spike train to the BLA is a signal that contains no structure that could influence the timing of the BLA output; hence, we used this as our CS input signal. We also note that the CS input can be of many forms in general fear conditioning (e.g., tone, light, odor), and we wished to de-emphasize the specific nature of the CS. The reference mentioned in the Recommendations for authors, (Quirk, Armony, and LeDoux 1997), uses pulses 2 seconds long. At the end of fear conditioning, the response to those pulses is brief. However, in the early stages of conditioning, the response goes on for as long as the figure shows. The authors do show the number of cells responding decreases from early to late training, which perhaps reflects increasing specificity over training. This feature is not currently in our model, but we look forward to thinking about how it might be incorporated. Regarding the CS pulsed protocol used in (Krabbe et al., 2019), it has been shown that intense inputs (6kHz and 12 kHz inputs) can lead to metabotropic effects that last much longer than the actual input (200 ms duration) (Whittington et al., Nature, 1995). Thus, the effective input to the BLA may indeed be more like Poisson.

      Our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning. Despite paradigms involving both overlapping (delay conditioning, where US coterminates with CS (Lindquist et al., 2004), or immediately follows CS (e.g., Krabbe et al., 2019)) and non-overlapping (trace conditioning) CS/US inputs existing in the literature, we hypothesized that concomitant activity in CS- and US-encoding neuron activity should be crucial in both cases. This may be mediated by the memory effect, as suggested in the Discussion of our paper, or by metabotropic effects as suggested above, or by the contribution from other brain regions. We will emphasize in our revision that the overlap in time, however instantiated, is a hypothesis of our model. It is hard to see how plasticity can occur without some memory trace of US. This is a consequence of our larger hypothesis that fear learning uses spike-timing-dependent plasticity; such a hypothesis about plasticity is common in the modeling literature. We will discuss these points in more detail in our revision.

      We thank Reviewer #2 for their comments. Below, we reply to each of them:

      1) Gamma oscillations are generated locally; thus, it is appropriate to model in any cortical structure. However, the generation of theta rhythms is based on the interplay of many brain areas therefore local circuits may not be sufficient to model these oscillations. Moreover, to generate the classical theta, a laminal structure arrangement is needed (where neurons form layers like in the hippocampus and cortex)(Buzsaki, 2002), which is clearly not present in the BLA. To date, I am not aware of any study which has demonstrated that theta is generated in the BLA. All studies that recorded theta in the BLA performed the recordings referenced to a ground electrode far away from the BLA, an approach that can easily pick up volume conducted theta rhythm generated e.g., in the hippocampus or other layered cortical structure. To clarify whether theta rhythm can be generated locally, one should have conducted recordings referenced to a local channel (see Lalla et al., 2017 eNeuro). In summary, at present, there is no evidence that theta can be generated locally within the BLA. Though, there can be BLA neurons, firing of which shows theta rhythmicity, e.g., driven by hippocampal afferents at theta rhythm, this does not mean that theta rhythm per se can be generated within the BLA as the structure of the BLA does not support generation of rhythmic current dipoles. This questions the rationale of using theta as a proxy for BLA network function which does not necessarily reflect the population activity of local principal neurons in contrast to that seen in the hippocampus.

      In both modeling and experiments, a laminar structure does not seem to be needed to produce a theta rhythm. A recent experimental paper, (Antonoudiou et al. 2021), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. The authors draw this conclusion by looking at mice ex vivo slices. The currents that generate these rhythms are in the BLA, since the hippocampus was removed to eliminate hippocampal volume conduction and other nearby brain structures did not display any oscillatory activity. Also, in the modeling literature, there are multiple examples of the production of theta rhythms in small networks not involving layers; these papers explain the mechanisms producing theta from non-laminated structures (Dudman et al., 2009, Kispersky et al., 2010, Chartove et al. 2020). We are not aware of any model description of the mechanisms of theta that do require layers.

      2) The authors distinguished low and high theta. This may be misleading, as the low theta they refer to is basically a respiratory-driven rhythm typically present during an attentive state (Karalis and Sirota, 2022; Bagur et al., 2021, etc.). Thus, it would be more appropriate to use breathing-driven oscillations instead of low theta. Again, this rhythm is not generated by the BLA circuits, but by volume conducted into this region. Yet, the firing of BLA neurons can still be entrained by this oscillation. I think it is important to emphasize the difference.

      Many rhythms of the nervous system can be generated in multiple parts of the brain by multiple mechanisms. We do not dispute that low theta appears in the context of respiration; however, this does not mean that other rhythms with the same frequencies are driven by respiration. Indeed, in the above answer we showed that theta can appear in the BLA without inputs from other regions. In our paper, the low theta is generated in the BLA by VIP+ neurons. Using intrinsic currents known to exist in VIP+ neurons (Porter et al., 1998), modeling has shown that such neurons can intrinsically produce a low theta rhythm. This is also shown in the current paper. This example is part of a substantial literature showing that there are multiple mechanisms for any given frequency band. We will emphasize these points in our revision; we note that, for any individual case, such as this one, the mechanism needs to be tested experimentally.

      3) The authors implemented three interneuron types in their model, ignoring a large fraction of GABAergic cells present in the BLA (Vereczki et al., 2021). Recently, the microcircuit organization of the BLA has been more thoroughly uncovered, including connectivity details for PV+ interneurons, firing features of neurochemically identified interneurons (instead of mRNA expression-based identification, Sosulina et al., 2010), synaptic properties between distinct interneuron types as well as principal cells and interneurons using paired recordings. These recent findings would be vital to incorporate into the model instead of using results obtained in the hippocampus and neocortex. I am not sure that a realistic model can be achieved by excluding many interneuron types.

      The interneurons and connectivity that we used were inspired by the functional connectivity reported in (Krabbe et al., 2019) (see above answer to Reviewer #1). As reported in (Vereczki et al., 2021), there are multiple categories and subcategories of interneurons; that paper does not report on which ones are essential for fear conditioning. We did use all the highly represented categories of the interneurons, except NPY-containing neurogliaform cells.

      The Reviewer says “I am not sure that a realistic model can be achieved by excluding many interneuron types”. We agree with the Reviewer that discarding the introduction of other interneurons subtypes and the description of more specific connectivity (soma-, dendrite-, and axon-targeting connections) may limit the ability of our model to describe all the details in the BLA. However, this work represents a first effort towards a biophysically detailed description of the BLA rhythms and their function. As in any modeling approach, assumptions about what to describe and test are determined by the scientific question; details postulated to be less relevant are omitted to obtain clarity. The interneuron subtypes we modeled, especially VIP+ and PV+, have been reported to have a crucial role in fear conditioning (Krabbe et al., 2019). Other interneurons, e.g. cholecystokinin and SOM+, have been suggested as essential in fear extinction. Thus, in the follow-up of this work to explain fear extinction, we will introduce other cell types and connectivity. In the current work, we have achieved our goals of explaining the origin of the experimentally found rhythms and their roles in the production of plasticity underlying fear learning. Of course, a more detailed model may reveal flaws in this explanation, but this is science that has not been yet done.

      4) The authors set the reversal potential of GABA-A receptor-mediated currents to -80 mV. What was the rationale for choosing this value? The reversal potential of IPSCs has been found to be -54 mV in fast-spiking (i.e., parvalbumin) interneurons and around -72 mV in principal cells (Martina et al., 2001, Veres et al., 2017).

      A GABA-A reversal potential around -80 mV is common in the modeling literature (Jensen et al., 2005; Traub et al., 2005; Kumar et al., 2011; Chartove et al., 2020). Other computational works of the amygdala, e.g. (Kim et al., 2016), consider GABA-A reversal potential at -75 mV based on the cortex (Durstewitz et al., 2000). The papers cited by the reviewer have a GABA-A reversal potential of -72 mV for synapses onto pyramidal cells; this is sufficiently close to our model that it is not likely to make a difference. For synapses onto PV+ cells, the papers cited by the reviewer suggest that the GABA-A reversal potential is -54 mV; such a reversal potential would lead these synapses to be excitatory instead of inhibitory. However, it is known (Krabbe et al., 2019; Supp. Fig. 4b) that such synapses are in fact inhibitory. Thus, we wonder if the measurements of Martina and Veres were made in a condition very different from that of Krabbe. For all these reasons, we consider a GABA-A reversal potential around -80 mV in amygdala to be a reasonable assumption. We will discuss these points in our revision.

      5) Proposing neuropeptide VIP as a key factor for learning is interesting. Though, it is not clear why this peptide is more important in fear learning in comparison to SST and CCK, which are also abundant in the BLA and can effectively regulate the circuit operation in cortical areas.

      We do not think that VIP is necessarily more fundamental in fear learning, and certainly not for fear extinction. We will make this clear in the revision.

      We thank Reviewer #3 for their comments and for recognizing that we achieved our modeling aims. We reply to the criticisms below.

      Weaknesses:

      The main weakness of the approach is the lack of experimental data from the BLA to constrain the biophysical models. This forces the authors to use models based on other brain regions and leaves open the question of whether the model really faithfully represents the basolateral amygdala circuitry. Furthermore, the authors chose to use model neurons without a representation of the morphology. However, given that PV+ and SOM+ cells are known to preferentially target different parts of pyramidal cells and given that the model relies on a strong inhibition form SOM to silence pyramidal cells, the question arises whether SOM inhibition at the apical dendrite in a model representing pyramidal cell morphology would still be sufficient to provide enough inhibition to silence pyramidal firing. Lastly, the fear learning relies on the presentation of the unconditioned stimulus over a long period of time (40 seconds). The authors justify this long-lasting input as reflecting not only the stimulus itself but as a memory of the US that is present over this extended time period. However, the experimental evidence for this presented in the paper is only very weak.

      Many of these issues were addressed in the previous responses.

      1) Our neurons were constrained by electrophysiology properties in response to hyperpolarizing currents in the BLA (Sosulina et al., 2010). We choose the specific currents, known to be present in these neurons, to replicate those responses.

      2) Though a much more detailed description of BLA interneurons was given in (Vereczki et al., 2021), it is not clear that this level of detail is relevant to the questions that we were asking, especially since the experiments described were not done in the context of fear learning.

      3) It is true that we did not include the morphology, which undoubtedly makes a difference to some aspects of the circuit dynamics. As we described above, modeling requires the omission of many details to bring out the significance of other details.

      4) As described above, some form of memory or overlap in the activity of the excitatory projection neurons is necessary for spike-timing-dependent plasticity. In modeling, one must be specific about hypotheses, and describe why they are plausible, if not proved; indeed, modeling can explain known phenomena by showing how they are consequences of some (plausible) hypotheses, which themselves are open to experimental verification.

      5) The 40 seconds is not necessary if there are multiple presentations.

      Other critiques:

      1) It is correct that PV+ and SOM+ preferentially target different parts of excitatory projection neurons and that the model relies on a strong inhibition from SOM+ and PV+ to silence the excitatory projection neurons. This choice of parameters comes from using simplified models: it is standard in modeling to adjust parameters to compensate for simplifications.

      2) The SOM+ inhibition of the pyramidal cell firing can be seen as a hypothesis of our model. It is well known that VIP+ cells disinhibit pyramidal cells through inhibition of SOM+ and PV+ cells, which is all we are using in our model; hence this hypothesis is generally believed.

      The authors achieved the aim of constructing a biophysically detailed model of the BLA not only capable of fear learning but also showing spectral signatures seen in vivo. The presented results support the conclusions with the exception of a potential alternative circuit mechanism demonstrating fear learning based on a classical Hebbian (i.e. non-depression-dominated) plasticity rule, which would not require the intricate interplay between the inhibitory interneurons. This alternative circuit is mentioned but a more detailed comparison between it and the proposed circuitry is warranted.

      We agree with the reviewer that it would be good to have a more detailed comparison with the classical Hebbian rule (non-depression-dominated rule). However, we demonstrated in Supplementary Materials that the non-depression-dominated rule is less robust and only operates within a limited window of PV+ excitation. We will have a more robust discussion of plasticity in the revision.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      The authors report the first use of the bacterial Tus-Ter replication block system in human cells. A single plasmid containing two divergently oriented five-fold TerB repeats was integrated on chromosome 12 of MCF7 cells. ChIP and PLA experiments convincingly demonstrate the occupancy of Tus at the Ter sites in cells. Using an elegant Single Molecule Analysis of Replicated DNA (SMARD) assay, convincing data demonstrate the replication block at Ter sites dependent on the presence of the protein. As an orthogonal method to demonstrate fork stalling, ChIP data show the accumulation of the replicative helicase component MCM3 and the repair protein FANCM around the Ter sites. It is unclear whether the Ter sites integrated by a single copy plasmid have any effect on the replication of this region but the data show that the observed effects are dependent on expression of the Tus protein. The SMARD data do not reveal what proportion of forks are arrested at Tus/Ter, or how long the fork delay is imposed. Fork stalling led to a highly localized gammaH2AX response, as monitored by ChIP using primer pairs spread along the integrated plasmid carrying the Ter sites. This response was shown to be dependent on ATR using the ATR inhibitor VE-822. This contrasts with a single Cas9-induced DSB between the two Ter sites, which causes a more spread gammaH2AX response. While this was monitored only at a single distal site, the difference between the DSB and the Tus-induced stall is very significant. Interestingly, despite evidence for ATR activation through the gammaH2AX response, no evidence for phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33 could be found under fork stalling conditions. The global replication inhibitor hydroxyurea (HU) elicited phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33. In this context, it would have been of interest to examine if a single DSB in the Ter region leads to phosphorylation of ATR-T1989, CHK1-S345, or RPA2-S33 and cell cycle arrest. It is not shown whether the replication inhibitor HU leads to the same widely spread gamma H2AX response. Overall, this is a well written manuscript, and the data provide convincing evidence that the Tus-Ter system poses a site-specific replication fork block in MCF7 cells leading to a localized ATR-dependent DNA damage checkpoint response that is distinct from the more global response to HU or DSBs.

      Author response to public review:

      “It is unclear whether the Ter sites integrated by a single copy plasmid have any effect on the replication of this region but the data show that the observed effects are dependent on expression of the Tus protein.”

      -The lack of perturbation of the TerB sequence on fork progression has extensively been studied previously in both Willis et al, 2014 and Larsen et. al, 2014. Furthermore, as the detection of the SMARD signal at the TerB sites is dependent on the 7.5kb probe that spans the TerB sites (orange probe, Fig 2B & 2D), it would be impossible to study the effect on replication in this region, with and without the integration of the single copy plasmid.

      “The SMARD data do not reveal what proportion of forks are arrested at Tus/Ter, or how long the fork delay is imposed.”

      -The percentage of fork stalling at the TerB sites, with and without Tus expression, has been quantified in Figure 2E & 2F. Essentially, 36% forks stall at the TerB block, i.e. 18% of the forks stall in both the 5’ to 3’ (orange) and 3’ to 5’ (blue) direction when the Tus-TerB block is active.

      “It is not shown whether the replication inhibitor HU leads to the same widely spread gamma H2AX response.”

      -While we have not shown gH2AX accumulation via ChIP after HU treatment, Supplementary Figure 5A & 5B clearly show increased gH2AX foci when the cells are treated with HU, suggesting a global replication stress response that is in stark contrast to the response to Tus-TerB.

      Recommendations for the authors:

      Lines 78, 95: In the experimental set-up there are two divergent 5-TerB sites in the orientation that is non-permissive for the fork progression notwithstanding the direction. This raises an obvious question: How an intervening (~1kb-long) DNA segment in being replicated? Does it stay under-replicated and then break?

      -The reviewers pose an important question about how the intervening sequence flanked by the two TerB sites is replicated, and if this leads to formation of anaphase bridges resulting in breaks. We think this is very plausible and this very question is part of ongoing studies in the lab with the aim to understand how the cell resolves a site-specific block. Unfortunately, this falls outside the scope of the current study.

      Also, it is unclear what is meant with non-permissive orientation. This depends on the predominant replication direction. As the construct has Ter repeats in opposite orientation, any direction is non-permissive. These descriptions could be rephrased to avoid confusion

      -The text has been edited to clarify this.

      Fig 1A: It would be helpful to annotate the map to show the position of each primer relative to the Ter array. Why is there no signal for pp52?

      -Figure 1A has the map of the locus with the annotated primer pairs and their relative positions to the TerB array.

      -pp52 is positioned beyond the TerB array so binding of the Tus-His protein there is unlikely, confirming the specificity of the Tus binding to only the TerB array and not to the adjacent chromatin.

      Figure 1B: Change Tus to Tus-His to make it easier to understand that the anti-His ChIP is targeting Tus. Provide information what normalization method was used in the ChIP experiments.

      -Figure 1B has been edited to reflect this change

      Line 113: Willis et al. 2014 also worked with chromosomal Ter sites, which should be acknowledged here.

      The text has been modified to indicate this. We apologize for the oversight.

      Line 126: Define pWB15 and its significance in text.

      -The text has been edited to clarify this and mentions pWB15.

      Figure 2E, F: Define legend (blue, orange boxes and arrow heads).

      -The figure legend corresponding to Figure 2 has a detailed description of the boxes and the arrows.

      Figure 3E, 4C: Add map of primers like in Figures 1 and 2.

      -The map added to Figures 3 & 4 and text updated.

      Figure 4: Showing that the gammaH2AX response is spread like with the single DSB would bolster the conclusion about the difference between a local and global response. Fig 4A, Lane-3: A loading control for the chromatin fraction is missing.

      -Measuring gH2AX chromatin spread after global replication stress can be challenging. We have tried to address the question of global and local gH2AX response post replication stress by quantifying gH2AX foci in cells treated with and without hydroxyurea, comparing it with cells that have a functional Tus-TerB block (Supplementary Figure 5A& 5B). A single fork block seems to only elicit a local response while a global replication stress leads to gH2AX accumulation globally in the cell.

      -Lamin A/C has been added to Fig 4A as a loading control for the chromatin fraction.

      Figure S4: Analyzing ATR, CHK1 and RPA phosphorylation as well as cell cycle profile under single DSB condition may reveal that different localized responses exist. I mention this because it was reported in yeast that a single DSB in G1 cells leads to a similarly localized Mec1 (ATR) -dependent response that does not elicit phosphorylation of Rad53 (CHK1) and other downstream targets, but leads to H2A phosphorylation as well as phosphorylation of RPA and the Rad51 paralog Rad55 (see PMCID: PMC2853130). It might be of interest to the reader to discuss this publication and the commonalities and differences between both localized checkpoint response

      -The reviewers raise an interesting question about the phosphorylation of ATR/CHK1/RPA and its effect on cell cycle after a single DSB. The aim of using the Cas9 break site in this study was merely to corroborate previously published observations pertaining to the spread of gH2AX after a DSB and to contrast that with the local response seen with Tus-TerB. Thus, while an intriguing question, we do not think this particular experiment will help in the understanding of the localized checkpoint response after a single replication fork block. However, we have included the observations previous published in the yeast system (PMC2853130) in our discussion as it helps compare and contrast fork blocks and DSBs further. It is of worth though that the yeast studies were looking at the cellular response to a DSB in G1.

      Lines 256-260: In the discussion of ATRIP, unpublished data are discussed that show no increase in ssDNA. What is the effect of ATRIP depletion? Maybe delete this mention of unpublished data, if no new data can be provided. The authors are aware that this makes the mechanism of ATR activation at the 5-TerB site elusive.

      -This statement has been deleted and the text has been modified.

      Another possibility discussed by the authors is fork reversal. Since Tus/Ter complex block the CMG progression, fork reversal would result in a chicken foot structure with the long single-stranded 3'-overhang of an Okazaki fragment site. Such a structure should be protected by BRCA2 or RAD52 proteins from degradation. Any role for these proteins in the checkpoint activation at the TerB site?

      -The reviewers suggest an interesting scenario where the Tus-TerB block induced reversed fork structure could be protected by the loading of known DNA repair proteins and this in turn could lead to a signaling mechanism and checkpoint activation. While we have not tested this hypothesis, nor studied the temporal dynamics of the formation if the reversed fork with respect to gH2AX accumulation, we think the localized gH2AX signal observed in the vicinity of the block is what initiates the downstream DDR response, promoting fork stabilization, followed either by fork reversal and restart or fork collapse. If the reversed fork was responsible for the gH2AX signaling, one would envision the spread to be more widespread, perhaps decorating the entire stretch of DNA between the block and the reversed fork. However, further studies are warranted to tease out this mechanism and the spatio-temporal dynamics.

      Lines 292-294: The authors state that "unpublished work from our laboratory has demonstrated that replication forks are cleaved at or near the TerB site..." Unless the data are shown, it might be best to eliminate discussion of unpublished work, also because the occurrence of DNA ends at Ter sites was already described in Willis et al. 2017.

      -The statement has been deleted and Willis et al. 2017 has been referenced.

      Suppl Table 1: It would help to also show representative images of stretched fibers in addition to the summary data shown.

      -Since the data is negative, the fiber images do not show any discernible differences and we do not think it adds useful information.

      Suppl Fig 4. ChIP for gamma H2AX data. It would be helpful to show the distribution of the gamma H2AX signal along the chromosome for both the DSB response and the Tus/Ter response.

      -The gH2AX ChIP signal at PP0-2 and PP10 has been included in Supplementary Fig4D. Though not significant for PP0-2, the data strongly suggests that there is increased spread of gH2AX along the chromosome after a DSB, strongly contrasting with the response after Tus-TerB block. The text has been modified to include both primer pairs.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer comments

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __Summary __

      The manuscript by Parker et al addresses the important question of how different organisms have evolved pre-messenger RNA systems that are either more or less complex. This question underlies the evolution of complex organisms and the genome adaptation of simple organisms to their specific environments, so is an important question to answer. This manuscript now provides the underlying molecular mechanisms of how 5' splice site sequence preference may have evolved which is both an interesting and exciting advance for the field.

      We thank the reviewer for these kind comments.

      __Major comments __

      __This manuscript builds on the previous work from this group where they identified the role of adenosine N6 methylation (m6A) of the U6 small nuclear RNA (snRNA) of the spliceosome by METTL16 as being important for 5' splice site selection. This work led to the speculation that loss of a METTL16 ortholog, or potentially other splicing factors, in some species could contribute to an evolutionary change in 5' splice site sequence preference. Here the authors now use the power of phylogenetics, interspecies association mapping and the available spliceosome structures to provide convincing conclusions that 5' splice site sequence preferences in the extensive number of organisms examined correlate with the presence of the U6 snRNA methyltransferase METTL16 and the splicing factor SNRNP27K. __

      __An analysis of METTL16 conservation was first carried out by comparing the METTL16 methyltransferase domain (MTD) in 29 diverse eukaryotic species. All the METTL16 orthologs were found to have either one or two globular domains. Three domain types were identified and compared in detail. What was not clear from this analysis was the functional significance of orthologs having either one or two domains. __

      We identified several species, including Drosophila melanogaster, whose METTL16 orthologs do not contain a VCR domain. However, in this study we do not draw specific conclusions about the functional significance of orthologs having different domain topologies.

      __In addition, while this analysis provides important new information on the domain structure of METTL16 orthologs, especially where these domains had not been identified previously, the link between this section of the results and the following sections is not that apparent. __

      We agree that there is a significant difference in approach between the first section of the Results and the following sections. However, we are keen to keep this part of the manuscript because it provides an orthogonal line of evidence suggesting that the ancestral role of METTL16 in eukaryotes is specifically the methylation of U6 snRNA.

      __Next novel bioinformatics pipelines were developed to compare both introns and orthologous groupings of protein coding genes between 227 Sacchromycotina genomes as well as 13 well-annotated eukaryote genomes. First, the 5' splice site sequence preference was compared and clearly indicates that the +4 position has the greatest variation in preferences within the Sacchromycotina. The ability to now compare a large number of genomes has provided novel information on the evolution of the 5' splice site sequence and the conclusion that there is more complexity to the 5' splice site in fungi that previously recognized. While it is apparent why only the 5' splice site signal was investigated here, with its relationship to the U6 snRNA and METTL16, it seems a shame the other splice site sequences were not analyzed using this novel pipeline. In any case, the complexity of the 5' splice site +4 position now allows, for the first time, interesting interspecies association studies. __

      We have now included the variance plots for 3’SS motifs (analogous to the 5’SS variance plots shown in Figure 2B) as Figure 2 supplementary figure 4A, and a traitgram for 3’SS -3C to U ratio as Figure 2 supplementary figure 4B. We have included a short section of text in the Results section to describe these additional findings.

      __With ____the 5' splice site +4 variation identified, the next step was to determine the underlying molecular mechanisms that dictate the evolution of the various sequence preferences. Some obvious players here are the U1 and U6 snRNAs which directly interact with the 5' splice site during splicing. However, no association was found between these snRNAs and the 5' splice site +4 sequence. __

      __The powerful interspecies association mapping was then used to determine whether the presence or absence of METTL16 ortholog or a splicing factor correlated with the 5' splice site +4 sequence variation. Interestingly, a clear association was found between METTL16 and the 5' splice site +4 position; METTL16 presence was associated with +4A at the 5' splice site and METTL16 absence was associated with +4U at the 5' splice site. This is an exciting and significant finding. __

      We thank the reviewer for these comments on the importance of this study.

      __Interestingly, the next most significant association with the 5' splice site +4 position was with SNRNP27K. This result makes sense as in the cryo-EM structure of the pre-B spliceosome complex the C-terminal domain of SNRNP27K is found near the region of the U6 snRNA that will interact with the 5' splices site. Absence of SNRNP27K was associated with an increased preference for +4U at the 5' splice site. Now the real power of the interspecies association mapping was demonstrated by investigating whether any association could be determined specifically within the C-terminus of SNRNP27K. Significantly, the methionine 141 position in SNRNP27K was found to be associated with the +4 position of the 5' splice site. This finding fits nicely with previous studies where mutation of M141 caused a shift in 5' splice site selection away from +4A 5' splice sites, to 5' splice sites without +4A. What is not clear is whether M141 is conserved or invariant between all the species that were compared? __

      M141 is not completely conserved across the species that were compared for the SNRNP27K C-terminus analysis. We did not test positions with very strong sequence conservation, because without variation in both the genotype and phenotype it is not possible to test for an association. We have rephrased the relevant Results and Methods sections to make this point clearer. In addition, we have incorporated a sequence logo to illustrate the degree of conservation of each position in the SNRNP27K C-terminal domain as Figure 5 -figure supplement 1A. Finally, we have included an additional box-plot to illustrate the finding that species which have lost SNRNP27K or have only lost the Methionine equivalent to human SNRNP27K position 141, show a similar preference for +4U at 5’ SSs. This is now included as Figure 5 - figure supplement 1B.

      Overall, this result reveals the power of the interspecies association approach and provides interesting and exciting information on the molecular determinants of 5' splice site evolution.

      We are grateful to the reviewer for these comments.

      __The final analysis was to investigate the interaction potentials of the U5 and U6 snRNAs with the 5' splice site in the Sacchromycotina genomes and try to relate this to species with fewer introns and less alternative splicing. Species with low intron numbers and low splicing complexity were revealed to have weaker U5 and U6 anti-correlation potentials and favor +4U at the 5' splice site. On the other hand, species with high intron number and presumably higher splicing complexity featured anti-correlated U5 and U6 snRNA interaction potentials and favored +4A 5' splice sites. This extensive analysis provides novel information on the interactions and splice site properties of species with simple and complex splicing. Again, I see why there is emphasis on the 5' splice site here but a similar analysis with the U2 snRNA and the branch site could also be informative. __

      We absolutely agree that inter-species association mapping could be applied to other splicing signal phenotypes including 3’ splice sites and intron branchpoints. Accordingly, we raise this subject in the final section of the Discussion. However, branchpoint sequences are challenging to predict with genomic data. Because preliminary analyses suggest independent variation in these other splicing signal phenotypes, we feel a separate focused study is required to properly explain (and substantiate) even the analytical approaches involved. We hope the reviewer would agree that incorporating U2 snRNA and branchpoint variation analyses into this manuscript as well, could detract from the clarity of the conceptual advances that we make here.

      __Minor comments __

      __Should the Title include SNRNP27K? __

      We have included SNRNP27K into the revised title.

      Should the title specify that it is the evolution of only the 5’ splice site sequence preference being studied here?

      Because apostrophes in titles can compromise some scholarly online search engines (https://insights.uksg.org/articles/10.1629/uksg.534), we would prefer not to include 5’ in the title.

      Include information on intron number and 5’ splice site interaction potential of U5 and U6 snRNA in the Summary?

      We thank the reviewer for this suggestion. We have updated the Summary to include our findings on U5 and U6 interaction potential in species with reduced intron number.

      __Figure 1C is not referred to in the text? __

      We apologise for this oversight. We have added references to figure 1C in the appropriate Results section.

      Page 8, line 5 – better to say “splicing signal phenotypes”.

      We have amended this statement on Page 8 and at other places in the text where related phrasing was made.

      __What are the other points on Figure 3B? What is the next point below SNRNP27K? Is it U2A’? __

      The other points on Figure 3B represent Orthofinder orthogroups which contain human orthologs that are known components of the spliceosome. The list of spliceosomal components was taken from Sales-Lee et al. 2021. The third most significant point is indeed the orthogroup containing the human ortholog of U2A’. As we state in the text, however, the correlation of U2A’ with the 5’SS+4 A to U ratio phenotype is no longer significant once METTL16 presence/absence is controlled for, indicating that the correlation of U2A’ with the +4A phenotype is likely explained by similarity in the patterns of gene loss of U2A’ and METTL16.

      __The second paragraph of the Discussion is vague and lacks a reference. “we could also identify an association with a methionine residue in the conserved C-terminal domain of SNRNP27K orthologs.” There are a few methionines in the C-terminus, which one? Please reference the statement “transcriptome analysis of C. elegans SNRP-27 M141T mutants..” __

      We apologise for the lower quality of writing in this section of the Discussion. We have updated the text, made the statements about the SNRNP27K C-terminus less ambiguous, and added the relevant citations as appropriate.

      Reviewer #1 (Significance (Required)):

      Overall, this is a well written and clearly presented study that provides some key molecular information on the splicing factors involved in the evolution of 5’ splice sites and shows the power of interspecies association studies. Some important conceptual principles have now been defined for the field going forward.

      With thank the reviewer for this kind comment on the importance of this work.

      __The question remains as to whether METTL16 and SNRNP27K are the sole determinants of 5’ splice site preference evolution at +4? __

      We cannot say for certain that METTL16 and/or SNRNP27K determine the 5’SS +4 phenotype – only that they are correlated with it. In our response to reviewer 3, and in a new Discussion section, we have detailed some of the scenarios that could explain these correlations. We also cannot rule out whether there are changes in the presence/absence (or domain/sequence-level changes) of other, untested proteins that correlate with the 5’SS +4 phenotype and we allude to this in the final section of the Discussion.

      One splicing factor that immediately comes to mind is Prp8 where there is extensive evidence for involvement in splice site selection and is clearly in the right location throughout splicing to be involved. This question should at least be discussed but Prp8 would also be a very interesting candidate for the interspecies association mapping.

      Prp8 is a core component of spliceosomes and is conserved throughout the Saccharomycotina. For this reason, we were unable to associate splicing phenotypes with Prp8 presence or absence variation at the level of orthogroups. However, we revisited this question posed by the reviewer. Our experience with inter-species association mapping, so far, indicates it works well with orthogroup presence/absence or when straightforward amino acid substitutions can be detected in conserved and hence alignable protein sequence domains. We analysed the conserved U6 snRNA-interacting region of the Prp8 linker domain, which maps close to the 5’ splice site in cryo-EM models, using the profile HMM PF10596 available from Pfam. We found that the majority of this domain was extremely highly conserved with variation in only a few species and positions. The strongest correlation with the +4A to U ratio phenotype was at position 58, which is conserved as a Glycine in all but 8 species (6 Dipodascaceae, 2 CUG-Ser1), that also tend to have a stronger preference for +4A. However, examination of the species contributing to this result (and to similar results at other positions) indicated that in the 6 Dipodascaceae species, this change is part of a larger deletion or replacement that makes the whole linker region align poorly to the model. Hence, the G58 position itself may not be specifically important for the +4 phenotype. Although the wholesale loss or replacement of the U6 snRNA-interacting region in these species is potentially interesting, these larger scale structural changes in a small number of species are difficult to interpret. Therefore, to maintain the focus of the manuscript and the clear links to METTL16 and SNRNP27K that have orthogonal support, we have decided not to add these results to the manuscript but present them here (Figure not available on biorXiv commenting window).

      Also, as mentioned previously, only the 5’ splice site was investigated here and the manuscript could become a more substantial piece of work if the other splice sites were included in some way.

      We agree that it will be exciting to apply this approach to other splicing signal phenotypes and in other phylogenetic clades with emerging tree-of-life-scale genomics data. We have included variation in 3’ splice sites in the revised manuscript. As the first of its kind, this study should pioneer a wider use of this approach, by us and others, to understand the mechanisms and functions of molecular interactions not only in splicing but in other areas of biology too.

      __The obvious audience here are those directly in the splicing field but the overall principles are relevant for evolutionary biologists and those studying organismal complexity. __

      We thank the reviewer for recognising the broad importance of this work.

      My expertise is in yeast and human splicing mechanisms. I do not have the expertise to critically evaluate the bioinformatic pipelines but they were clearly explained and presented.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their manuscript, Parker et al. investigate the evolutionary patterns of splice site preference, focusing on the A/U ratio at position A+4 on the 5´ splice site. Building upon prior studies in S. pombe and A. thaliana, the authors establish a strong correlation between this preference and the co-evolution of the METTL16 U6 snRNA methyltransferase. Furthermore, through inter-species association mapping, they identify the involvement of the splicing factor SNRNP27K in altered A/U ratios and highlight the significance of the residue Met-141 in SNRNP27K for this function. Overall, the paper effectively presents impactful new findings on the evolution of METTL16, U6 snRNA, and splicing.

      We thank the reviewer for these kind comments on the importance of our study.

      The computational analyses employed in this study are situated outside our field of expertise, preventing us from offering a comprehensive evaluation of the methodology’s appropriateness and rigor. Nonetheless, the identification of METTL16 through the authors’ methods, which aligns with previous research in S. pombe and A. thaliana, lends support to the validity of their approach. Notably, the close proximity between SNRNP27K and the methylated A43 residue in U6 snRNA within the spliceosome, particularly near Met-141, is an impressive finding. Previous studies have shown that a mutation at position M141T affects splicing at +4A introns, thus providing robust validation for their methods.

      We thank the reviewer for these kind comments on our work.

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We thank the reviewer for these kind comments.

      Minor suggestions for improvement:

      1. __ Given the significance of the interaction between U6 snRNA and the intron for understanding the data, it would be beneficial to include a figure illustrating the RNA-RNA base-pairing interactions between U6 snRNA and the 5´ splice site. This addition is particularly important if the paper is intended for publication in a journal with a general readership.__  We thank the reviewer for this excellent suggestion. We have included this as Figure 3A.

      __ Similarly, the section on U1 snRNA would be more comprehensible with the inclusion of U1 RNA-RNA intron diagrams and improved descriptions of both the figures and the assay. Despite being negative data in the supplement, clarifying this section is essential. As currently written, it is challenging to follow.__ 

      We agree that this section is difficult to follow. We have updated the text to improve the readability and included a figure of U1 snRNA:5’SS basepairing as Figure 3 – figure supplement 1A.

      __ Whenever possible, consider increasing the figure and font sizes to enhance readability for readers.__ 

      We agree that some of the more complex figures can be difficult to read when embedded into a Word document/pdf. We hope that providing high-resolution figures for reading online will mitigate this.

      __ In the text, there is no reference to Figure 1C.__ 

      We apologise for this oversight. We have resolved this issue with the appropriate references in the Results text.

      __ In Figure 5B, the y-axis in the top panel is labelled “species,” but the legend only mentions U5/6p as the y-axis. Please revise the legend to include the appropriate information.__ 

      We apologise for the confusion caused by our poorly written legend for this plot. We have updated the legend so that the text clearly refers to either the scatter plot or the marginal histograms.

      Reviewer #2 (Significance (Required)):

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We are grateful to the reviewer for these kind comments on the importance of this work.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Parker et al present a nice exploration of the evolutionary and mechanistic relationships between 5′ splice site consensus sequences, intron numbers and METTL16/SNRNP27K. By performing inter-species association mapping in Saccharomycotina species, they found that a T in position +4 is strongly associated with the absence of METTL16 (and/or in some cases SNRNP27K or mutations in it). They also provide solid structural modelling data in support of this association.

      In general, I think this is a very nice manuscript. I only have a few comments, which could be addressed by rewording specific parts and/or improving the current figures.

      We are grateful to the reviewer for the kind comments on this work.

      1) As the authors acknowledge, a key issue that cannot be fully resolved in this study is causality between the different events investigated. Overall, the authors are careful about this, but there are some exceptions that should be corrected. Probably the most important is in the abstract, where they write: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is crucial to evolutionary change in splicing complexity”. I suggest they write something more open (and correct), such as: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is associated with evolutionary changes in splicing complexity”. Similarly, other plausible scenarios should be discussed in the corresponding Discussion section.

      We agree with the reviewer that it is not possible to infer the causal relationship between METTL16 absence and 5’SS+4 preference change from the current data. We, therefore, apologise for failing to be more careful in the Summary and Introduction. We have reworded these statements to better reflect what we can currently say about the evolutionary relationship between METTL16 and 5’SS sequence preference.

      The correlation between METTL16 absence and 5'SS+4 sequence preference change could most likely be explained by one of several scenarios: (a) sudden loss of METTL16 causes a rapid necessity to change 5'SS sequence preferences. This is unlikely as such rapid change without widespread corresponding 5'SS changes would likely impose a high fitness cost. (b) Changes in 5'SS sequence preference occur first, driven by some other selective pressure, until there is no longer a benefit to retaining the METTL16 gene. (c) Gradual changes in the expression or catalytic efficiency of METTL16 reduce the stoichiometry of U6 snRNA m6A modification, which permits gradual change in 5'SS+4 sequence preference until complete loss of the METTL16 no longer imposes a major fitness cost. As we suggest in the Discussion, future work could examine this question by determining whether the METTL16 orthologs found in Zygosaccharomyces and Eremothecium species, which have altered their 5'SS+4 preference to a U, are expressed and functional. We have updated the Discussion to include a new section that addresses these scenarios.

      2) I do not agree with the statement that "The extent of alternative splicing is the best genomic predictor of developmental complexity". To start with, there are many ways to quantify "extent of alternative splicing" and there are also different types of alternative splicing that might have different prevalence and biological impact. Then, this claim is usually related with exon skipping, which is tightly linked with intron length, and that is likely a better prediction of complexity (yet clearly not causative). My concern is: to what extent has this claim been formally and properly assessed by comparing splicing prevalence with other genomic features, such as intergenic region length, intron length, or average distance between enhancer-promoter interactions (arguably the most relevant predictor, in light of many other studies)? Moreover, I found it a bit misleading to frame the work presented in this study as directly related with developmental (or even splicing) complexity. The work is very interesting on its own, and I doubt their findings on +4 position preference in Saccharomycotina has anything to do with developmental complexity (as the Abstract and Introduction seem to imply).

      On reflection, we agree with the reviewer. Some of our framing of the text isn’t balanced with other studies on the scaling of alternative splicing with developmental complexity. We have edited the Summary and Introduction sections accordingly and cited other references that broaden the consideration of this subject. We are grateful to the reviewer for this suggestion because the changes we make improve the focus of the manuscript since our findings relate more to splicing simplification than to an understanding of increased developmental complexity.

      __3) I found Figure 2 and its associated supplementary figure very difficult to follow. I suggest the authors try to improve it and make it clearer. Also, other trees summarizing the results might be helpful. __

      We apologise for the complexity of these figures. We opted to show phylogenetic trees with phenotypes plotted on the y axis, rather than simply trait histograms or box-plots, because the underlying structure of the tree is important for demonstrating that multiple independent changes in the 5’SS phenotype have occurred in the Saccharomycotina. We have tried to improve the comprehensibility of the figures in the following ways: (a) We have added 5’SS sequence motifs to the x-axis of figure 2B to make what the plot represents clearer, (b) as suggested by the reviewer, we have created a pruned tree showing the 5’SS motifs of a selection of Saccharomycotina species, which demonstrates that the changes in 5’SS+4 position preferences seen in S. cerevisiae and C. albicans are likely to be a result of convergent evolution. We have added this tree as Figure 2 - figure supplement 3.

      __4) I also found the Results section corresponding to Figure 5B a bit confusing. I would argue (as I think the authors do) that there are two main patterns here: below 500 introns, there is no association, while above 500 introns there is an increasingly negative association (correlation). I think it would help to more explicitly distinguishing these two patterns. Then, for the intron-poor species: is the correlation (or lack of) for species with a T or an A in position +4 different? __

      We do indeed think that there are two patterns here, as indicated by the reviewer. In the previous version of the manuscript, we separated species into those having an overall preference for A at the +4 position, and those having +4U. By showing regression lines for these two classes, rather than for the general relationship between intron number and U5/6rho, we somewhat imply that the switch in +4 base preference might be causing the loss of correlation between U5/6rho and intron number. However, since essentially all species with a 5'SS +4U preference are intron poor, it seems more likely that these trends are the result of a loss of the negative correlation between intron number and U5/6rho in intron poor species, as suggested by the reviewer. To address this issue, we have replaced the regression lines on Figure 6B with a single loess (locally estimated scatterplot smoothing) regression line for all species and updated the text to make it clearer that we think loss of U5/6rho and +4A preference are separate traits of intron poor species. Although this is not exactly what the reviewer requested, we hope that it satisfies their issue with the analysis.

      __Reviewer #3 (Significance (Required)): __

      __This is a very interesting study that sheds light on an intriguing evolutionary pattern: the change in consensus sequence at position +4 of the 5' splice site. This topic is relevant since it is closely associated with intron loss and splicing efficiency and evolution. __

      We thank the reviewer for the kind and constructive comments on this study.

    2. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The manuscript by Parker et al addresses the important question of how different organisms have evolved pre-messenger RNA systems that are either more or less complex. This question underlies the evolution of complex organisms and the genome adaptation of simple organisms to their specific environments, so is an important question to answer. This manuscript now provides the underlying molecular mechanisms of how 5' splice site sequence preference may have evolved which is both an interesting and exciting advance for the field.

      We thank the reviewer for these kind comments.

      Major comments

      This manuscript builds on the previous work from this group where they identified the role of adenosine N6 methylation (m6A) of the U6 small nuclear RNA (snRNA) of the spliceosome by METTL16 as being important for 5' splice site selection. This work led to the speculation that loss of a METTL16 ortholog, or potentially other splicing factors, in some species could contribute to an evolutionary change in 5' splice site sequence preference. Here the authors now use the power of phylogenetics, interspecies association mapping and the available spliceosome structures to provide convincing conclusions that 5' splice site sequence preferences in the extensive number of organisms examined correlate with the presence of the U6 snRNA methyltransferase METTL16 and the splicing factor SNRNP27K. 

      An analysis of METTL16 conservation was first carried out by comparing the METTL16 methyltransferase domain (MTD) in 29 diverse eukaryotic species. All the METTL16 orthologs were found to have either one or two globular domains. Three domain types were identified and compared in detail. What was not clear from this analysis was the functional significance of orthologs having either one or two domains.

      We identified several species, including Drosophila melanogaster, whose METTL16 orthologs do not contain a VCR domain. However, in this study we do not draw specific conclusions about the functional significance of orthologs having different domain topologies.

      In addition, while this analysis provides important new information on the domain structure of METTL16 orthologs, especially where these domains had not been identified previously, the link between this section of the results and the following sections is not that apparent.

      We agree that there is a significant difference in approach between the first section of the Results and the following sections. However, we are keen to keep this part of the manuscript because it provides an orthogonal line of evidence suggesting that the ancestral role of METTL16 in eukaryotes is specifically the methylation of U6 snRNA.

      Next novel bioinformatics pipelines were developed to compare both introns and orthologous groupings of protein coding genes between 227 Sacchromycotina genomes as well as 13 well-annotated eukaryote genomes. First, the 5' splice site sequence preference was compared and clearly indicates that the +4 position has the greatest variation in preferences within the Sacchromycotina. The ability to now compare a large number of genomes has provided novel information on the evolution of the 5' splice site sequence and the conclusion that there is more complexity to the 5' splice site in fungi that previously recognized. While it is apparent why only the 5' splice site signal was investigated here, with its relationship to the U6 snRNA and METTL16, it seems a shame the other splice site sequences were not analyzed using this novel pipeline. In any case, the complexity of the 5' splice site +4 position now allows, for the first time, interesting interspecies association studies.

      We have now included the variance plots for 3’SS motifs (analogous to the 5’SS variance plots shown in Figure 2B) as Figure 2 supplementary figure 4A, and a traitgram for 3’SS -3C to U ratio as Figure 2 supplementary figure 4B. We have included a short section of text in the Results section to describe these additional findings.

      With the 5' splice site +4 variation identified, the next step was to determine the underlying molecular mechanisms that dictate the evolution of the various sequence preferences. Some obvious players here are the U1 and U6 snRNAs which directly interact with the 5' splice site during splicing. However, no association was found between these snRNAs and the 5' splice site +4 sequence. 

      The powerful interspecies association mapping was then used to determine whether the presence or absence of METTL16 ortholog or a splicing factor correlated with the 5' splice site +4 sequence variation. Interestingly, a clear association was found between METTL16 and the 5' splice site +4 position; METTL16 presence was associated with +4A at the 5' splice site and METTL16 absence was associated with +4U at the 5' splice site. This is an exciting and significant finding.

      We thank the reviewer for these comments on the importance of this study.

      Interestingly, the next most significant association with the 5' splice site +4 position was with SNRNP27K. This result makes sense as in the cryo-EM structure of the pre-B spliceosome complex the C-terminal domain of SNRNP27K is found near the region of the U6 snRNA that will interact with the 5' splices site. Absence of SNRNP27K was associated with an increased preference for +4U at the 5' splice site. Now the real power of the interspecies association mapping was demonstrated by investigating whether any association could be determined specifically within the C-terminus of SNRNP27K. Significantly, the methionine 141 position in SNRNP27K was found to be associated with the +4 position of the 5' splice site. This finding fits nicely with previous studies where mutation of M141 caused a shift in 5' splice site selection away from +4A 5' splice sites, to 5' splice sites without +4A. What is not clear is whether M141 is conserved or invariant between all the species that were compared?

      M141 is not completely conserved across the species that were compared for the SNRNP27K C-terminus analysis. We did not test positions with very strong sequence conservation, because without variation in both the genotype and phenotype it is not possible to test for an association. We have rephrased the relevant Results and Methods sections to make this point clearer. In addition, we have incorporated a sequence logo to illustrate the degree of conservation of each position in the SNRNP27K C-terminal domain as Figure 5 -figure supplement 1A. Finally, we have included an additional box-plot to illustrate the finding that species which have lost SNRNP27K or have only lost the Methionine equivalent to human SNRNP27K position 141, show a similar preference for +4U at 5’ SSs. This is now included as Figure 5 - figure supplement 1B.

      Overall, this result reveals the power of the interspecies association approach and provides interesting and exciting information on the molecular determinants of 5' splice site evolution.

      We are grateful to the reviewer for these comments.

      The final analysis was to investigate the interaction potentials of the U5 and U6 snRNAs with the 5' splice site in the Sacchromycotina genomes and try to relate this to species with fewer introns and less alternative splicing. Species with low intron numbers and low splicing complexity were revealed to have weaker U5 and U6 anti-correlation potentials and favor +4U at the 5' splice site. On the other hand, species with high intron number and presumably higher splicing complexity featured anti-correlated U5 and U6 snRNA interaction potentials and favored +4A 5' splice sites. This extensive analysis provides novel information on the interactions and splice site properties of species with simple and complex splicing. Again, I see why there is emphasis on the 5' splice site here but a similar analysis with the U2 snRNA and the branch site could also be informative.

      We absolutely agree that inter-species association mapping could be applied to other splicing signal phenotypes including 3’ splice sites and intron branchpoints. Accordingly, we raise this subject in the final section of the Discussion. However, branchpoint sequences are challenging to predict with genomic data. Because preliminary analyses suggest independent variation in these other splicing signal phenotypes, we feel a separate focused study is required to properly explain (and substantiate) even the analytical approaches involved. We hope the reviewer would agree that incorporating U2 snRNA and branchpoint variation analyses into this manuscript as well, could detract from the clarity of the conceptual advances that we make here.

      Minor comments

      Should the Title include SNRNP27K?

      There is certainly a case that the title should include SNRNP27K. Our aim was to make the title as short and informative as possible without too many acronyms that need explaining. Since the clearest correlation is with METTL16 and this has broader implications for understanding the role of this enzyme not only in splicing but in possibly modifying other RNA targets too, we think not including SNRNP27K is a suitable compromise. In addition, retaining the current title simplifies the tracking of the manuscript from pre-print through to journal publication.

      Should the title specify that it is the evolution of only the 5’ splice site sequence preference being studied here?

      Because apostrophes in titles can compromise some scholarly online search engines (https://insights.uksg.org/articles/10.1629/uksg.534), we would prefer not to include 5’ in the title.

      Include information on intron number and 5’ splice site interaction potential of U5 and U6 snRNA in the Summary?

      We thank the reviewer for this suggestion. We have updated the Summary to include our findings on U5 and U6 interaction potential in species with reduced intron number.

      Figure 1C is not referred to in the text?

      We apologise for this oversight. We have added references to figure 1C in the appropriate Results section.

      Page 8, line 5 – better to say “splicing signal phenotypes”.

      We have amended this statement on Page 8 and at other places in the text where related phrasing was made.

      What are the other points on Figure 3B? What is the next point below SNRNP27K? Is it U2A’? 

      The other points on Figure 3B represent Orthofinder orthogroups which contain human orthologs that are known components of the spliceosome. The list of spliceosomal components was taken from Sales-Lee et al. 2021. The third most significant point is indeed the orthogroup containing the human ortholog of U2A’. As we state in the text, however, the correlation of U2A’ with the 5’SS+4 A to U ratio phenotype is no longer significant once METTL16 presence/absence is controlled for, indicating that the correlation of U2A’ with the +4A phenotype is likely explained by similarity in the patterns of gene loss of U2A’ and METTL16.

      The second paragraph of the Discussion is vague and lacks a reference. “we could also identify an association with a methionine residue in the conserved C-terminal domain of SNRNP27K orthologs.” There are a few methionines in the C-terminus, which one? Please reference the statement “transcriptome analysis of C. elegans SNRP-27 M141T mutants..”

      We apologise for the lower quality of writing in this section of the Discussion. We have updated the text, made the statements about the SNRNP27K C-terminus less ambiguous, and added the relevant citations as appropriate.

      Reviewer #1 (Significance):

      Overall, this is a well written and clearly presented study that provides some key molecular information on the splicing factors involved in the evolution of 5’ splice sites and shows the power of interspecies association studies. Some important conceptual principles have now been defined for the field going forward.

      With thank the reviewer for this kind comment on the importance of this work.

      The question remains as to whether METTL16 and SNRNP27K are the sole determinants of 5’ splice site preference evolution at +4?

      We cannot say for certain that METTL16 and/or SNRNP27K determine the 5’SS +4 phenotype – only that they are correlated with it. In our response to reviewer 3, and in a new Discussion section, we have detailed some of the scenarios that could explain these correlations. We also cannot rule out whether there are changes in the presence/absence (or domain/sequence-level changes) of other, untested proteins that correlate with the 5’SS +4 phenotype and we allude to this in the final section of the Discussion.

      One splicing factor that immediately comes to mind is Prp8 where there is extensive evidence for involvement in splice site selection and is clearly in the right location throughout splicing to be involved. This question should at least be discussed but Prp8 would also be a very interesting candidate for the interspecies association mapping.

      Prp8 is a core component of spliceosomes and is conserved throughout the Saccharomycotina. For this reason, we were unable to associate splicing phenotypes with Prp8 presence or absence variation at the level of orthogroups. However, we revisited this question posed by the reviewer. Our experience with inter-species association mapping, so far, indicates it works well with orthogroup presence/absence or when straightforward amino acid substitutions can be detected in conserved and hence alignable protein sequence domains. We analysed the conserved U6 snRNA-interacting region of the Prp8 linker domain, which maps close to the 5’ splice site in cryo-EM models, using the profile HMM PF10596 available from Pfam. We found that the majority of this domain was extremely highly conserved with variation in only a few species and positions. The strongest correlation with the +4A to U ratio phenotype was at position 58, which is conserved as a Glycine in all but 8 species (6 Dipodascaceae, 2 CUG-Ser1), that also tend to have a stronger preference for +4A. However, examination of the species contributing to this result (and to similar results at other positions) indicated that in the 6 Dipodascaceae species, this change is part of a larger deletion or replacement that makes the whole linker region align poorly to the model. Hence, the G58 position itself may not be specifically important for the +4 phenotype. Although the wholesale loss or replacement of the U6 snRNA-interacting region in these species is potentially interesting, these larger scale structural changes in a small number of species are difficult to interpret. Therefore, to maintain the focus of the manuscript and the clear links to METTL16 and SNRNP27K that have orthogonal support, we have decided not to add these results to the manuscript but present them here (Figure not available on biorXiv commenting window).

      Also, as mentioned previously, only the 5’ splice site was investigated here and the manuscript could become a more substantial piece of work if the other splice sites were included in some way.

      We agree that it will be exciting to apply this approach to other splicing signal phenotypes and in other phylogenetic clades with emerging tree-of-life-scale genomics data. We have included variation in 3’ splice sites in the revised manuscript. As the first of its kind, this study should pioneer a wider use of this approach, by us and others, to understand the mechanisms and functions of molecular interactions not only in splicing but in other areas of biology too.

      The obvious audience here are those directly in the splicing field but the overall principles are relevant for evolutionary biologists and those studying organismal complexity.

      We thank the reviewer for recognising the broad importance of this work.

      My expertise is in yeast and human splicing mechanisms. I do not have the expertise to critically evaluate the bioinformatic pipelines but they were clearly explained and presented.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In their manuscript, Parker et al. investigate the evolutionary patterns of splice site preference, focusing on the A/U ratio at position A+4 on the 5´ splice site. Building upon prior studies in S. pombe and A. thaliana, the authors establish a strong correlation between this preference and the co-evolution of the METTL16 U6 snRNA methyltransferase. Furthermore, through inter-species association mapping, they identify the involvement of the splicing factor SNRNP27K in altered A/U ratios and highlight the significance of the residue Met-141 in SNRNP27K for this function. Overall, the paper effectively presents impactful new findings on the evolution of METTL16, U6 snRNA, and splicing.

      We thank the reviewer for these kind comments on the importance of our study.

      The computational analyses employed in this study are situated outside our field of expertise, preventing us from offering a comprehensive evaluation of the methodology’s appropriateness and rigor. Nonetheless, the identification of METTL16 through the authors’ methods, which aligns with previous research in S. pombe and A. thaliana, lends support to the validity of their approach. Notably, the close proximity between SNRNP27K and the methylated A43 residue in U6 snRNA within the spliceosome, particularly near Met-141, is an impressive finding. Previous studies have shown that a mutation at position M141T affects splicing at +4A introns, thus providing robust validation for their methods.

      We thank the reviewer for these kind comments on our work.

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We thank the reviewer for these kind comments.

      Minor suggestions for improvement:

      1. Given the significance of the interaction between U6 snRNA and the intron for understanding the data, it would be beneficial to include a figure illustrating the RNA-RNA base-pairing interactions between U6 snRNA and the 5´ splice site. This addition is particularly important if the paper is intended for publication in a journal with a general readership.

      We thank the reviewer for this excellent suggestion. We have included this as Figure 3A.

      1. Similarly, the section on U1 snRNA would be more comprehensible with the inclusion of U1 RNA-RNA intron diagrams and improved descriptions of both the figures and the assay. Despite being negative data in the supplement, clarifying this section is essential. As currently written, it is challenging to follow.

      We agree that this section is difficult to follow. We have updated the text to improve the readability and included a figure of U1 snRNA:5’SS basepairing as Figure 3 – figure supplement 1A.

      1. Whenever possible, consider increasing the figure and font sizes to enhance readability for readers.

      We agree that some of the more complex figures can be difficult to read when embedded into a Word document/pdf. We hope that providing high-resolution figures for reading online will mitigate this.

      1. In the text, there is no reference to Figure 1C.

      We apologise for this oversight. We have resolved this issue with the appropriate references in the Results text.

      1. In Figure 5B, the y-axis in the top panel is labelled “species,” but the legend only mentions U5/6p as the y-axis. Please revise the legend to include the appropriate information.

      We apologise for the confusion caused by our poorly written legend for this plot. We have updated the legend so that the text clearly refers to either the scatter plot or the marginal histograms.

      Reviewer #2 (Significance):

      The data presented in this study furnish crucial insights into the role of METTL16, U6 snRNA methylation, and splice site recognition. The authors expand upon recent observations that the “vertebrate conserved region” exists in non-vertebrates, despite the absence of primary sequence homology. These results will serve as a valuable guide for future molecular investigations into U6 snRNA methylation and its mechanisms in splicing. Furthermore, the implications of this paper extend to human evolution, as the plasticity in splicing is an essential factor in the evolution of developmental complexity.

      We are grateful to the reviewer for these kind comments on the importance of this work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript, Parker et al present a nice exploration of the evolutionary and mechanistic relationships between 5′ splice site consensus sequences, intron numbers and METTL16/SNRNP27K. By performing inter-species association mapping in Saccharomycotina species, they found that a T in position +4 is strongly associated with the absence of METTL16 (and/or in some cases SNRNP27K or mutations in it). They also provide solid structural modelling data in support of this association.

      In general, I think this is a very nice manuscript. I only have a few comments, which could be addressed by rewording specific parts and/or improving the current figures.

      We are grateful to the reviewer for the kind comments on this work.

      1) As the authors acknowledge, a key issue that cannot be fully resolved in this study is causality between the different events investigated. Overall, the authors are careful about this, but there are some exceptions that should be corrected. Probably the most important is in the abstract, where they write: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is crucial to evolutionary change in splicing complexity”. I suggest they write something more open (and correct), such as: “We conclude that variation in concerted processes of 5’ splice site selection by U6 snRNA is associated with evolutionary changes in splicing complexity”. Similarly, other plausible scenarios should be discussed in the corresponding Discussion section.

      We agree with the reviewer that it is not possible to infer the causal relationship between METTL16 absence and 5’SS+4 preference change from the current data. We, therefore, apologise for failing to be more careful in the Summary and Introduction. We have reworded these statements to better reflect what we can currently say about the evolutionary relationship between METTL16 and 5’SS sequence preference.

      The correlation between METTL16 absence and 5'SS+4 sequence preference change could most likely be explained by one of several scenarios: (a) sudden loss of METTL16 causes a rapid necessity to change 5'SS sequence preferences. This is unlikely as such rapid change without widespread corresponding 5'SS changes would likely impose a high fitness cost. (b) Changes in 5'SS sequence preference occur first, driven by some other selective pressure, until there is no longer a benefit to retaining the METTL16 gene. (c) Gradual changes in the expression or catalytic efficiency of METTL16 reduce the stoichiometry of U6 snRNA m6A modification, which permits gradual change in 5'SS+4 sequence preference until complete loss of the METTL16 no longer imposes a major fitness cost. As we suggest in the Discussion, future work could examine this question by determining whether the METTL16 orthologs found in Zygosaccharomyces and Eremothecium species, which have altered their 5'SS+4 preference to a U, are expressed and functional. We have updated the Discussion to include a new section that addresses these scenarios.

      2) I do not agree with the statement that "The extent of alternative splicing is the best genomic predictor of developmental complexity". To start with, there are many ways to quantify "extent of alternative splicing" and there are also different types of alternative splicing that might have different prevalence and biological impact. Then, this claim is usually related with exon skipping, which is tightly linked with intron length, and that is likely a better prediction of complexity (yet clearly not causative). My concern is: to what extent has this claim been formally and properly assessed by comparing splicing prevalence with other genomic features, such as intergenic region length, intron length, or average distance between enhancer-promoter interactions (arguably the most relevant predictor, in light of many other studies)? Moreover, I found it a bit misleading to frame the work presented in this study as directly related with developmental (or even splicing) complexity. The work is very interesting on its own, and I doubt their findings on +4 position preference in Saccharomycotina has anything to do with developmental complexity (as the Abstract and Introduction seem to imply).

      On reflection, we agree with the reviewer. Some of our framing of the text isn’t balanced with other studies on the scaling of alternative splicing with developmental complexity. We have edited the Summary and Introduction sections accordingly and cited other references that broaden the consideration of this subject. We are grateful to the reviewer for this suggestion because the changes we make improve the focus of the manuscript since our findings relate more to splicing simplification than to an understanding of increased developmental complexity.

      3) I found Figure 2 and its associated supplementary figure very difficult to follow. I suggest the authors try to improve it and make it clearer. Also, other trees summarizing the results might be helpful. 

      We apologise for the complexity of these figures. We opted to show phylogenetic trees with phenotypes plotted on the y axis, rather than simply trait histograms or box-plots, because the underlying structure of the tree is important for demonstrating that multiple independent changes in the 5’SS phenotype have occurred in the Saccharomycotina. We have tried to improve the comprehensibility of the figures in the following ways: (a) We have added 5’SS sequence motifs to the x-axis of figure 2B to make what the plot represents clearer, (b) as suggested by the reviewer, we have created a pruned tree showing the 5’SS motifs of a selection of Saccharomycotina species, which demonstrates that the changes in 5’SS+4 position preferences seen in S. cerevisiae and C. albicans are likely to be a result of convergent evolution. We have added this tree as Figure 2 - figure supplement 3.

      4) I also found the Results section corresponding to Figure 5B a bit confusing. I would argue (as I think the authors do) that there are two main patterns here: below 500 introns, there is no association, while above 500 introns there is an increasingly negative association (correlation). I think it would help to more explicitly distinguishing these two patterns. Then, for the intron-poor species: is the correlation (or lack of) for species with a T or an A in position +4 different? 

      We do indeed think that there are two patterns here, as indicated by the reviewer. In the previous version of the manuscript, we separated species into those having an overall preference for A at the +4 position, and those having +4U. By showing regression lines for these two classes, rather than for the general relationship between intron number and U5/6rho, we somewhat imply that the switch in +4 base preference might be causing the loss of correlation between U5/6rho and intron number. However, since essentially all species with a 5'SS +4U preference are intron poor, it seems more likely that these trends are the result of a loss of the negative correlation between intron number and U5/6rho in intron poor species, as suggested by the reviewer. To address this issue, we have replaced the regression lines on Figure 6B with a single loess (locally estimated scatterplot smoothing) regression line for all species and updated the text to make it clearer that we think loss of U5/6rho and +4A preference are separate traits of intron poor species. Although this is not exactly what the reviewer requested, we hope that it satisfies their issue with the analysis.

      Reviewer #3 (Significance):

      This is a very interesting study that sheds light on an intriguing evolutionary pattern: the change in consensus sequence at position +4 of the 5' splice site. This topic is relevant since it is closely associated with intron loss and splicing efficiency and evolution. 

      We thank the reviewer for the kind and constructive comments on this study.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1*. This is a good paper dealing with gap of our knowledge in understanding reason of ICB failures. Subject being difficult it is expected that the design and content of such experiment will be complex.But the authors forget practicality of readers attention and making paper apear interesting. They need to organise and may be classify the varied information in such a way that reader can find a rhythm in excavating data more easily. It appears confusing at time, so they may try to make it more simple. In this way they may concentrate more on methods and classify results too. A thorough revision is suggested, to make it consize. *

      __Authors’ answer: __We thank the Reviewer for his positive evaluation and constructive feedback. We appreciate the complexity of single-cell RNA-sequencing analyses. In order to simplify our manuscript, our revised manuscript now focuses on the transitional states of tumor-resident and circulating T cells found in ovarian cancer patients. Our study is timely as it is the first to report the developmental relationship of TILs in ovarian cancer. We substantially edited our manuscript to make it clear that our findings suggest a gradual acquisition of the exhaustion program initiated by effector-like cells (cluster CD8_GZMH) that eventually gives rise to more terminal states with features of tissue residency and chemotaxis (clusters CD8_CCL4, CD8_XCL1, and CD8_CXCL13). We also include new analyses revealing the presence and proportion of these T cell states in different cancer patients (New Fig. 4A-B), and how these T cell states associate with clinical responses to immune checkpoint blockade (ICB). We hope the Reviewer will find our revised manuscript easier to read.

      Reviewer #2. I think the first half of the article, in which the GZMH-CD8 cluster is considered to be in an intermediate state of transition to exhaustion, is interesting, and I feel that the single-cell seq and TCR data are well analyzed to make the point. On the other hand, I feel that the latter part of the paper may not be anything more than a hypothesis. In particular, the part claiming that it is related to prognosis or applicable to the prediction of the effect of ICB is insufficient, since their gene signature is not described in detail and the contents of the Figure are not mentioned in the manuscript. In the latter part, the effects of GPR184 and 25-HC, or the effects of IL21, would require experiments to verify (to verify whether the addition of chemokine or the inhibition of the receptor changes the specific CD8 population).

      Author’s answer: Thank you for discussing the limitation of the signature employed. We agree with the reviewer’s comment. Old Figure 5 has been removed from the revised manuscript.

      Reviewer #2. Minor point: In particular, there is little mention of Figure 5 in the text, making it difficult to understand.

      Author’s answer: Thank you for your comment. As we previously discussed, we have removed Figure 5 from the revised manuscript. The method used to generate the signature was found to be inappropriate.

      Reviewer #2. The latter part is difficult to understand. To begin with, it is already known that ovarian cancer does not contribute much to ICB, so what does it mean to analyze the CD8 population, which is known as a marker of ICB response in other carcinomas, as an indicator? Especially for clinicians like us, it is hard to imagine that the results will lead to clinical trials that will attempt to sort out the population that ICB is favored in.

      Author’s answer: Although immune checkpoint blockade has demonstrated limited effectiveness against ovarian cancer, subset analyses suggest superior efficacy for some patients and according to subtype. Combination anti-PD-1/CTLA-4 therapy for instance achieved response rates up to 31% (Zamarin et al., 2020), and superior benefit for single agent PD-1 blockade has been reported in clear cell ovarian cancer. Moreover, encouraging clinical results have recently been reported in studies exploring combinations with PARP and VEGF inhibitors. As example, interim analysis of the phase 3 DUO-O trial (NCT03737643) showed a statistically significant and clinically meaningful improvement in PFS in patients with newly diagnosed advanced ovarian cancer without a BRCA1/2 mutation (Harter et al., 2023).

      Our study aimed to better understand how ovarian tumor-infiltrating T cells acquire their exhaustion program after migrating from the periphery and whether these mechanisms are unique or shared amongst cancer types. Recent studies in other cancer types had shown the dynamics of T cells and demonstrated the clonal replacement of intratumoral T cells after ICB and emphasized the role of peripheral clones in this process (Wu et al., 2020; Yost et al., 2019). In lung cancer, it has been proposed a transitional state between precursor and terminally differentially cells (Gueguen et al., 2021). Our study demonstrates, for the first time in ovarian cancer, the presence of similar transitional states of CD8 T cells. Our revised manuscript also now includes new data revealing that pre-effector GZMK- and intermediary GZMH-expressing CD8 cells are better biomarkers of ICB response than terminally differentiated XCL1 and CXCL13 expressing CD8 T cells (New Figure 4). Altogether, our study provides important and novel insights on the development of tumor-infiltrating T cells in ovarian cancer patients, which may serve to better select ovarian cancer patients for ICB therapy.

      Reviewer #2. Since the first half of the study is very interesting, we feel that it is more important to confirm the mechanism of exhaustion from the blood via the intermediate (GZMH_CD8), including functional experiments. Also, as a clinician, we are very interested in the perspective of whether some of the fractions identified in this study are different in proportion in different patients and whether they correlate with the clinical course of the disease since the study only analyzed a sample of 5 patients.

      Author’s answer: We thank the reviewer for proposing to extend our analysis. As suggested, our revised manuscript now includes new analyses which reveals the different proportions of our identified T cells states in different cancer patients (New Figure 4). We further investigated whether these T cell states associate with clinical responses and observed that pre-effector GZMK- and intermediary GZMH-expressing CD8 T cells are better biomarkers of ICB response than terminally exhausted XCL1- and CXCL13-expressing CD8 T cells (New Figure 4).

      Reviewer #3. Question 1: Whether the distribution patterns of CD4+ and CD8+ T cell clusters in Figure 1B were comparable among the 5 patient samples? Whether the proportion of five types of clones in Figure 3C are comparable among the 5 patient samples?

      Author’s answer: Thank you for the question. We included the results to answer these questions in the supplementary material (fig. S1C-D). For each patient, we calculated the proportion of a cluster among T cells in the blood or tumor. As observed in the boxplot (fig. S1C), the proportion of some subsets were higher in certain patients, such as the higher proportion of CD8_GZMK in the tumor of patient p09454. A recent study classified patients’ tumors based on the spatial distribution of CD8 T cells and performed scRNA-seq to identified cell subsets enriched in the groups inflamed/infiltrated (characterized by the distribution of CD8 T cells within the tumor epithelium), excluded (infiltrating CD8 T cells are restricted to the tumor stroma) or desert (T cells are not present or have low frequency) (Hornburg et al., 2021). Interestingly, this subset of CD8_GZMK cells were enriched in desert tumors, suggesting that the difference we observed in our dataset might reflect the spatial distribution of CD8 T cells in patient p09454. Regarding the TCR-seq data, the frequency of the five types of clones was different among patients. To show this data, we included a barplot (fig. S2D), showing for example, a higher proportion of tumor-expanded clones in patient p10329.

      Reviewer #3. Question 2: In Figure S2C, only a very small number of cells in the CD8-GZMK K-22 population. Are these cells representative? Do they generally exist in multiple samples or only in one sample?

      Author’s answer: Thank you for your comment. The subcluster k_22 indeed has a smaller number of cells compared to other subclusters. Nevertheless, the K_22 cluster was found in every patient and in every healthy donor. To clarify, we edited our revised manuscript to include a statement that cluster k_22 was composed of fewer cells compared to other clusters.

      Reviewer #3. Question 3: In the Fig.S6 legend, the authors stated "Our results suggest the differentiation of cluster CD8-GZMK into the effector-like subset CD8-GZMH." However, there seems to be no corresponding analysis in the main text to support this conclusion.

      Author’s answer: We appreciate your attention to this statement. We agree the results of our study doesn’t sustain this statement and so we have excluded it in the revised manuscript.

      Reviewer #3____. Question 4: Is there more detailed clinical information that can be provided for the 5 patients included in the study? Per the methods all patients were receiving debulking surgery and were treatment naïve, but did they differ in stage, age, comorbidities, etc.?

      Author’s answer: Thank you for your comment on this. We have included a table with clinical information on the stage, age, and menopause status of the five patients.

      Reviewer #3. Question 5: Were any cells included for sequencing from adjacent 'normal' tissue uninvolved with tumor (these samples are from surgical debulking of primary tumors, which may include such areas of non-involved tissue.) While shared TCR clonotypes between blood and intratumoral T cells strongly suggests the tumor-resident populations are recruited from the blood, the degree of sharing with normal tissue-resident T cells would be of interest as well.

      Author’s answer: Thank you for your comment. Samples were provided for sc-RNA-seq after pathology review and validation of tumor histology. We did not perform sc-RNA-seq on normal adjacent tissue (NAT) We agree this would be interesting as a follow up study, since in other cancer types (renal, colon and lung) it has been demonstrated that T clones expanded in the tumor and NAT are also present in peripheral blood (Wu et al., 2020).

      Reviewer #3. Question 6: Very little is discussed about HGSOC itself in the main text (eg clinical background, prior literature on the composition of infiltrating immune populations and potential reasons for at best modest poor responses to IO) until the first sentence of the discussion. As the entirety of the new data produced in this study is from HGSOC tumors there should be more focus on this tumor type and conversation with the prior literature on it (mainly from prior studies on the immune environment of HGSOC). Further, how distinct do the authors suspect the cell populations found in their study to be to ovarian as opposed to other epithelial tumor types?

      Author’s answer: Thank you for the suggestion. We now included more background information on immunotherapy of HGSOC. Specifically, we added the following paragraph in our introduction: “In ovarian cancer, the presence of both T and B cells improves patients' survival (Nelson, 2015; Nielsen et al, 2012). They are usually organized in lymphoid aggregates ranging from a small group of cells to a well-organized TLS (Kroeger et al, 2016). Organized TLSs correlate with better survival, such as observed in patients treated with ICB. Although immunotherapy has demonstrated limited effectiveness against ovarian cancer, subsets of patients may thus benefit from ICB. In support of this, combination anti-PD-1/CTLA-4 therapy can achieve response rates above 30% (Zamarin et al., 2020), and encouraging clinical results have recently been reported when combining ICB with with PARP and VEGF inhibitors (Harter et al., 2023)”.

      Reviewer #3. Question 7: Were the signature genes used for analysis in figure 5 remove chosen in a formal, unbiased manner, or simply hand-picked as representative of the respective cell types? This information is not provided in the supplement.

      Author’s answer: Another reviewer has also expressed similar concerns. The genes selected to represent cell types were chosen manually, which we acknowledge is not the best method for defining a signature. As a result, we have decided to exclude Figure 5 from the manuscript under review. We believe an unbiased approach is more suitable for characterizing the cell network proposed in our study.

      Reviewer #3. Question 8: While the NicheNet analysis of potential interactions among lymphocyte populations raises some strong hypotheses, it would be interesting to extend the interaction analysis to all CD45+ populations, given the sequencing was done on CD45+ immune cells.

      Author’s answer: Thank you for suggesting analysis. We have included the results of cell interaction including all CD45+ cells (fig. S3). We observed CD40L as one of the top predicted ligands highly expressed in CD4_CXCL13 subset mediating a response in subsets of antigen-presenting cells, such as B cells (cluster B), plasma cells (cluster PC_2), and plasmacytoid dendritic cells (cluster pDC). Interestingly, this result also support the hypothesis of Tfh-like cells (cluster CD4_CXCL13) coordinating the action of intratumoral immune cells involved in the antitumor immune response.

      Reviewer #3. Question 9: A sample size of 5 patients is relatively small for current single cell RNAseq studies of human tumor patients.

      Author’s answer: We agree with the reviewer that a sample size of 5 patients is relatively small. Thus, to validate our results in other patients, we included in the reviewed manuscript the analysis of scRNA-seq of 47 patients across10 cancer types (dataset from (Zheng et al., 2021). As demonstrated in figure 3 and figure 5, we could identify subsets of CD8 and CD4 T cells from our ovarian cancer patients in those 10 cancer types dataset.

      Reviewer #3.____ Minor

      *1. In lines 96-97, "CD8-GZMB" was mentioned twice in the description. *

      2. In line 126, this section did not discuss residency markers, yet a conclusion about residency was made in this sentence.

      Author’s answer: We appreciate you bringing these errors to our attention. We fixed them in the updated version of the manuscript.

      References:

      Gueguen, P., Metoikidou, C., Dupic, T., Lawand, M., Goudot, C., Baulande, S., … Amigorena, S. (2021). Contribution of resident and circulating precursors to tumor-infiltrating CD8 T cell populations in lung cancer. Science Immunology, Vol. 6, p. eabd5778. doi:10.1126/sciimmunol.abd5778

      Harter, P., Trillsch, F., Okamoto, A., Reuss, A., Kim, J.-W., Rubio-Pérez, M. J., … Aghajanian, C. (2023). Durvalumab with paclitaxel/carboplatin (PC) and bevacizumab (bev), followed by maintenance durvalumab, bev, and olaparib in patients (pts) with newly diagnosed advanced ovarian cancer (AOC) without a tumor BRCA1/2 mutation (non-tBRCAm): Results from the randomized, placebo (pbo)-controlled phase III DUO-O trial. Journal of Clinical Orthodontics: JCO, 41(17_suppl), LBA5506–LBA5506.

      Hornburg, M., Desbois, M., Lu, S., Guan, Y., Lo, A. A., Kaufman, S., … Wang, Y. (2021). Single-cell dissection of cellular components and interactions shaping the tumor immune phenotypes in ovarian cancer. Cancer Cell. doi:10.1016/j.ccell.2021.04.004

      Wu, T. D., Madireddi, S., de Almeida, P. E., Banchereau, R., Chen, Y.-J. J., Chitre, A. S., … Grogan, J. L. (2020). Peripheral T cell expansion predicts tumour infiltration and clinical response. Nature. doi:10.1038/s41586-020-2056-8

      Yost, K. E., Satpathy, A. T., Wells, D. K., Qi, Y., Wang, C., Kageyama, R., … Chang, H. Y. (2019). Clonal replacement of tumor-specific T cells following PD-1 blockade. Nature Medicine. doi:10.1038/s41591-019-0522-3

      Zamarin, D., Burger, R. A., Sill, M. W., Powell, D. J., Jr, Lankes, H. A., Feldman, M. D., … Aghajanian, C. (2020). Randomized Phase II Trial of Nivolumab Versus Nivolumab and Ipilimumab for Recurrent or Persistent Ovarian Cancer: An NRG Oncology Study. Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology, 38(16), 1814–1823.

      Zheng, L., Qin, S., Si, W., Wang, A., Xing, B., Gao, R., … Zhang, Z. (2021). Pan-cancer single-cell landscape of tumor-infiltrating T cells. Science, 374(6574), abe6474.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary: This study used single-cell transcriptomics and T cell receptor profiling to identify the developmental relationships of T cell populations in ovarian cancer patients. The researchers proposed a model of differentiation pathway that showed how an intermediate GZMH-expressing CD8 T cell subset progressively reinforces exhaustion and tissue residency programs towards terminally exhausted cells. Then they also focus on the nature of TPEX, dual-expanded clone, which is considered an important indicator for the efficacy of ICB, and argue that it is strongly related to GPR183, 25-OHC, and IL21. Based on the analysis of clinical samples, they argue that their proposed gene signature may also be prognostically relevant and predictive of ICB efficacy.

      Major comment: I think the first half of the article, in which the GZMH-CD8 cluster is considered to be in an intermediate state of transition to exhaustion, is interesting, and I feel that the single-cell seq and TCR data are well analyzed to make the point. On the other hand, I feel that the latter part of the paper may not be anything more than a hypothesis. In particular, the part claiming that it is related to prognosis or applicable to the prediction of the effect of ICB is insufficient, since their gene signature is not described in detail and the contents of the Figure are not mentioned in the manuscript. In the latter part, the effects of GPR184 and 25-HC, or the effects of IL21, would require experiments to verify (to verify whether the addition of chemokine or the inhibition of the receptor changes the specific CD8 population).

      Minor point: In particular, there is little mention of Figure 5 in the text, making it difficult to understand.

      Significance

      It is interesting to note that the authors simultaneously analyze immune cells in the blood and in the tumor, and examine in detail what is characteristic of the blood, what is characteristic of the tumor, and what is seen in both. And it is very interesting that they specifically proposes an intermediate group that is recruited from the blood to the tumor and is in the process of becoming exhausted. I am sure there are many studies on TILs and TLSs, but this study would be helpful to understand how they are concentrated locally (near the tumor) in comparison with immune cells in the blood as well.

      However, the latter part is difficult to understand. To begin with, it is already known that ovarian cancer does not contribute much to ICB, so what does it mean to analyze the CD8 population, which is known as a marker of ICB response in other carcinomas, as an indicator? Especially for clinicians like us, it is hard to imagine that the results will lead to clinical trials that will attempt to sort out the population that ICB is favored in.

      Since the first half of the study is very interesting, we feel that it is more important to confirm the mechanism of exhaustion from the blood via the intermediate state, including functional experiments. Also, as a clinician, we are very interested in the perspective of whether some of the fractions identified in this study are different in proportion in different patients and whether they correlate with the clinical course of the disease, since the study only analyzed a sample of 5 patients.

    1. Reviewer #2 (Public Review):

      Kraus, Aurora et al. investigated the potential immune response of the olfactory bulb after exposure of the infectious hematopoietic necrosis virus (IHNV), via the olfactory epithelia. Specifically, they show that a) viral-specific neuronal activation of "OSNs" (Crypt cells), b) changes in behaviour of both adult and larval zebrafish after viral exposure, c) Pituitary adenylate-cyclase-activating polypeptide (PACAP), was enriched when assayed by single cell transcriptomic profiling of cells in the OB after OSNs are exposed to IHNV

      Although the paper does have strengths in principle, the weaknesses of the manuscript are that these strengths are not directly demonstrated and the referencing of the manuscript omits many references important for the understanding of the questions and the results of the study. Furthermore, the data presented are not sufficient to fully support the key claims in the manuscript. In particular:

      a) Viral-specific neuronal activation of OSNs:<br /> What type of neurons? The authors are a bit elusive and do not clearly state that the neurons are crypt cells (Sepahi et al.: rainbow trout) which have a very specific axonal projection to the brain and whose response characteristics are not well characterized (see work of Korsching lab). Crypt cells are not present in the olfactory epithelia of mammals. Furthermore, in their previous work the crypt cells die; so how do they think the (inflammatory) virus response is transmitted to the olfactory bulbs in order to protect the brain?<br /> The authors state from previous work that they never detected virus in the brain, but why would they? Does INHV move trans-synaptically?<br /> The neuronal activity was monitored using a pan-neuronal marker thus these data are of limited use when trying to understand the role of neuronal activity (crypt cells) in the IHNV-triggered activity: the authors may be looking at a generalized inflammation response, and the image presented is not particularly informative it is difficult to decipher the results. The authors assume IHNV is an odorant without carefully ruling out the possibility of a generalized inflammation response.<br /> b) Changes in behaviour of both adult and larval zebrafish after viral exposure:<br /> What is the motivating question for looking at behaviour of the virus infected animals? Do we know the effects of crypt cell loss on the behaviour in any fish species? Authors need to build a better conceptual framework for the behaviour experiments.

      c) Pituitary adenylate-cyclase-activating polypeptide (PACAP) was enriched when assayed by single cell transcriptomic profiling of cells in the OB after OSNs are exposed to IHNV. Authors draw many generous conclusions from limited data. Authors seem to have forgotten to cite papers previously published showing that PACAP-38 has anti-viral activities in fish (VHSV: trout) such as: Velasquez et al 2020, First in vivo evidence of pituitary adenylate cyclase-activating polypeptide antiviral activity in teleost.<br /> The histology for PACAP presented in the manuscript is not convincing. The antibody is against the human form of PACAP thus any labelling should be treated with caution (and called PACAP-38-like).

      Summary: The authors need to better develop their model (perhaps a diagram would be helpful) explaining exactly which neurons are transmitting the information. Because of the elusive nature of some referencing and the skirting of important issues such as clearly stating which neurons are affected (crypt cells), what the point of the behaviour is (relate to neuronal type infected by virus), and, the lack of an antibody specific to the zebrafish protein, the model appears to be built on an unstable base.

    1. Joint Public Review:

      In this manuscript, the authors challenge the fundamental concept that all neurons are derived from ectoderm. The key points of the authors argument are as follows:

      1) Roughly half of the cells in the small intestinal longitudinal muscle-myenteric plexus (LM-MP) that express a pan-neuronal marker do not, by lineage tracing, appear to be derived from the neural crest.

      2) Lineage tracing and marker gene imaging suggest that these non-neural crest derived neurons originate in the mesoderm, leading to their designation as mesodermal-derived enteric neurons (MENs).

      3) Single-cell sequencing of LM-MP tissues confirms the mesodermal origin of MENs.

      4) MENs progressively replace neural crest derived enteric neurons as mice age, eventually representing the bulk of the EN population.

      There is broad agreement among the reviewers that the identification and description of this cell population is important, and that the failure of these cells to be labeled by neural crest lineage tracers is not artifactual. The work with transgenic lines is convincing that some presumptive neurons in the enteric nervous system (ENS) likely originate from an alternative source in the postnatal intestine and that this population increases in aging mice.

      There is, however, ongoing disagreement between the authors and reviewers about whether the authors' provocative and potentially paradigm-changing proposal that these are neurons of mesodermal origin has been established. While the authors believe they have addressed the reviewers' concerns in multiple rounds of review (much of this prior to submission), the reviewers remain unconvinced and continue to request additional data and analyses.

      A key premise of the preprint review system is that the best interests of science are not served by endlessly litigating disagreements around papers by either compelling the authors to do extensive and expensive additional experiments that they do not believe to be necessary or by treating the authors' claim as established in the face of continued skepticism. Accordingly the editor believes it is time to present this work, which everyone agrees contains important observations and valuable data, along with the following editor's synthesis of the reviewers' concerns and author responses about the question of these cells' origins. We encourage anyone interested in the details to review the already posted reviews and authors' response.

      The following key issues have been raised during review:

      * Is the lineage tracing and marker gene expression data definitive as to mesodermal origin?

      * Are the cells analyzed in the genomic experiments the same as those identified in the lineage tracing experiments?

      * Does the genomic data establish that the sub-population of cells the authors focus on are of mesodermal origin?

      * Are there alternative explanations for the lineage tracing and genomic observations than a mesodermal origin?

      * Is the lineage tracing and marker gene expression data definitive as to mesodermal origin? *

      The proximal evidence that the authors present for a mesodermal origin of the non-NC derived cells is presented in Figure 2, which establishes the presence, via lineage tracing of Tek+ and Mesp1+ (and therefore mesoderm derived) and Hu+ (and therefore neuronal) cells. The fraction of lineage labeled cells in each case (~50%) corresponds roughly to the fraction of cells that do not appear to be NC derived.

      The reviewers raise several technical questions about the lineage tracing experiments, including issues of incomplete labeling, ectopic labeling and toxicity. The authors have addressed each of these with data and/or citations, and the editor believes they have demonstrated, subject to the broader limits of lineage tracing experiments, that there are Hu+ cells in the tissue that are derived from cells that do not express NC markers and that do express mesodermal markers.

      One reviewer raised the question of whether these cells are neurons. This appears to the editor to be a valid question, in that specific neuronal activity of these cells has not been established. But the authors' argument is persuasive that their Hu+ state would have led them to be designated neurons and that changing that designation based on not being derived from NC is circular. However the possibility that, despite this accepted designation, these cells are not functionally neurons should be noted by readers.

      * Are the cells analyzed in the genomic experiments the same as those identified in the lineage tracing experiments, and does this data establish mesodermal origin? *

      To provide orthogonal evidence for the presence of mesodermally derived enteric neurons, the authors carried out single-cell sequencing of dissociated cells from hand-dissected longitudinal muscle - myenteric plexus (LM-MP) tissue. They use standard methods to identify clusters of cells with similar transcriptomes, and designate, based on marker gene expression, two clusters to be neural crest derived enteric neurons (NENs) and mesoderm derived enteric neurons (MENs). However the reviewers raised several issues about the designation of the cells MENs, and therefore their equation with the cells identified in lineage tracing.

      While the logic behind specific choices made in the single-cell analysis is not always clear in the manuscript, such as why genes not-specific to MENs were used to identify the MEN cluster and how genes were selected for subsequent analysis (although both issues are explained better in the authors' response to reviewers), they in the end identify a single large cluster that has the characteristics of MENs (it expresses both neuronal and mesodermal markers) that is (by immunohistochemistry) broadly associated with the previously described tissue MENs.

      The standard methods for the delineation of clusters in single-cell sequencing data (which the authors use) are stochastic and defy statistical interpretation, and the way these data and analyses are used is often subjective. The editor shares the reviewers' confusion about aspects of the analysis, but also finds the authors' assertions that they have described a cluster of cells that express both neuronal and mesodermal genes, and that this cluster corresponds to the tissue MENs described in lineage tracing, to be broadly sound.

      The biggest weakness in the single-cell data and analysis - identified by all reviewers - is the massive overrepresentation of MENs relative to NENs. The authors' explanation - that some cells are more sensitive to manipulations required to prepare cells for sequencing - is certainly well-represented in the literature and is therefore plausible. But it isn't fully satisfactory, especially because it undermines the notion that the MENs and NENs are functionally equivalent (though one could argue in response that increased fragility of NENs is why they are progressively replaced by MENs).

      There are many additional questions about the single cell analysis that are difficult to resolve with the data in hand. I think everyone would agree that an ideal analysis would have more cells, deeper sequencing, and comprehensive validation of the identity of each cluster of cells. But given the time and expense required to carry out such experiments, we cannot demand them, and must take the data for what they are rather than what they could be. And in the end, it is the editors' view that these data and analyses bolster the authors' claims, without conclusively establishing them. That is, these data should neither be dismissed nor, on their own, considered definitive.

      * Are there alternative explanations for the data than that they are mesodermally derived neurons? *

      As discussed above, the reviewers generally agree that the lineage tracing experiments are careful and well-executed, and the authors have provided data that demonstrates that the data are highly unlikely to be due to either incomplete or ectopic lineage marking. The reviewers raise several possible alternative hypotheses, some based on the literature and some based on the genomic data. The authors discuss each in detail in their response. The editor would note that, at this stage in the history of single-cell analysis, the criteria for using single cell sequencing data to establish cell type and cell origin is are not well established, and that neither the presence nor absence of specific sets of genes in single cells should not, for both technical and biological reasons, be considered dispositive as to identity.

      * Additional aspects of paper: *

      There are additional intriguing aspects of the paper, especially the increase in the number of MENs relative to NENs over time, suggesting functional replacement of one population with the other, and some evidence for and speculation about what might be regulating this evolution. However these are somewhat secondary points relative to the central question at hand of whether the authors have discovered a population of mesodermally derived neurons.

      * Editor's summary and comment: *

      The editor believes it is a fair summary to say that the authors believe they have gone to great lengths to provide multiple lines of evidence that support their hypothesis, but that these reviewers, while appreciating the potential importance of the authors' discovery of an unusual cell type, are not yet convinced of its origin.

      In an ideal world, the authors, reviewers and editor would all ultimately agree on what claims the data presented in a paper supports, and indeed this is what the traditional journal publishing system tries to achieve. But the system fails in cases like this where no consensus between authors and reviewers can be reached, as it neither makes sense to "accept" the paper and imply that it has been endorsed by the reviewers, nor to "reject" it and keep the work in peer review limbo.

      There is certainly enough here to warrant the idea and the data and arguments behind it being digested and considered by people in the field. It may very well be that the authors - who have spent years working on this problem and likely know more about this population of cells than anyone on Earth - are right that they have discovered something that changes how we think about the development of the nervous system. To the extent the reviewers are representative, people are likely to need additional data to be convinced. But it is time to put that to the test.

    1. Reviewer #3 (Public Review):

      This manuscript aims to exploit experimental measurements of the extracellular voltages produced by colliding action potentials to adjust a simplified model of action potential propagation that is then used to predict the extracellular fields at axon terminals. The overall rationale is that when solving the cable equation (which forms the substrate for models of action potential propagation in axons), the solution for a cable with a closed end can be obtained by a technique of superposition: a spatially reflected solution is added to that for an infinite cable and this ensures by symmetry that no axial current flows at the closed boundary. By this method, the authors calculate the expected extracellular fields for axon terminals in different situations. These fields are of potential interest because, according to the authors, their magnitude can be larger than that of a propagating action potential and may be involved in ephaptic signalling. The authors perform direct measurements of colliding action potentials, in the earthworm giant axon, to parameterise and test their model.

      Although simplified models can be useful and the trick of exploiting the collision condition is interesting, I believe there are several significant problems with the rationale, presentation, and application, such that the validity and potential utility of the approach is not established.

      Simplified model vs. Hogdkin and Huxley<br /> The authors employ a simplified model that incorporates a two-state membrane (in essence resting and excited states) and adds a recovery mechanism. This generates a propagating wave of excitation and key observables such as propagation speed and action potential width (in space) can be adjusted using a small number of parameters. However, even if a Hodgkin-Huxley model does contain a much larger number of parameters that may be less easy to adjust directly, the basic formalism is known to be accurate and typical modifications of the kinetic parameters are very well understood, even if no direct characterisations already exist or cannot be obtained. I am therefore unconvinced by the utility of abandoning the Hodgkin-Huxley version.

      In several places in the manuscript, the simplified model fits the data well whereas the Hodgkin-Huxley model deviates strongly (e.g. Fig. 3CD). This is unsatisfying because it seems unlikely that the phenomenon could not be modelled accurately using the HH formulation. If the authors really wish to assert that it is "not suitable to predict the effects caused by AP [collision]" (p9) they need to provide a good deal more analysis to establish the mechanism of failure.

      (In)applicability of the superposition principle<br /> The reflecting boundary at the terminal is implemented using the symmetry of the collision of action potentials. However, at a closed cable there is no reflecting boundary in the extracellular space and this implied assumption is particularly inappropriate where the extracellular field is one objective of the modelling, as here. I believe this assumption is not problematic for the calculation of the intracellular voltage, because extracellular voltage gradients can usually be neglected, but the authors need to explain how the issue was dealt with for the calculation of the extracellular fields of terminals. I assume they were calculated from the membrane currents of one-half of the collision solution, but this does not seem to be explained. It might be worth showing a spatial profile of the calculated field.

      Missing demonstrations<br /> Central analytical results are stated rather brusquely, notably equations (3) and (4) and the relation between them. These merit an expanded explanation at the least. A better explanation of the need for the collision measurements in parameterising the models should also be provided.

      Adjusted parameters<br /> I am uncomfortable that the parameters adjusted to fit the model are the membrane capacitance and intracellular resistance. These have a physical reality and could easily be measured or estimated quite accurately. With a variation of more than 20-fold reported between the different models in Appendix 2 we can be sure that some of the models are based upon quite unrealistic physical assumptions, which in turn undermines confidence in their generality.

      p8 the values of both the extracellular (100 Ohm m) and intracellular resistivity (1 Ohm m) appear to be in error, especially the former.

      (In)applicability to axon terminals<br /> The rationale of the application of the collision formalism to axon terminals is somewhat undermined by the fact that they tend not to be excitable. There is experimental evidence for this in the Calyx of Held and the cerebellar pinceau. The solution found via collision is therefore not directly applicable in these cases.

      Comparison with experimental data<br /> More effort should be made to compare the modelling with the extracellular terminal fields that have been reported in the literature.

      Choice of term "annihilation"<br /> The term annihilation does not seem wholly appropriate to me. The dictionary definitions are something along the lines of complete destruction by an external force or mutual destruction, for example of an electron and a positron. I don't think either applies exactly here. I suggest retaining the notion of collision which is well understood in this context.

    1. Author Response

      We thank the Editor and the Reviewers for the kind words, the helpful suggestions, and the points of critique, which have all helped us substantially strengthen the manuscript. We have made the aesthetic changes requested by Reviewer 2.

      Response to Reviewer 2

      We thank the Reviewer for their thorough feedback. We provide point by point responses below.

      Concern 1

      In paragraph 4.2, I found it unclear why the authors find it unsurprising that different experiments would correspond to different betas. I think that this point should be discussed, as beta and N appear in combination in determining the interaction strength. Otherwise, they could try to fit all distributions with the same beta, which would be more natural for me. I guess that the fits would be anyway good to the eye, though quantitatively suboptimal (which could be quantified with the distance introduced).

      The reviewer raises valid concerns since as shown in Fig 3, the chosen values for beta, the additional fitting parameter introduced in the agent-based simulation, are: β = 0.18, 0.13, 0.12 and 0.64 respectively for N = 5, 10, 15, 20. We (RS, OM, and OP) find it intriguing that the optimum beta clusters around similar values for N = 5, 10, 15, while the optimum beta for N = 20 is significantly different. We acknowledge that we do not have an explanation why the fitted parameters values are what they are but note that the fitting curve is flat, implying that several beta values could possibly achieve a satisfactory fit. While further agent-based simulations could explore these findings more systematically, we believe that investigating this matter is outside the scope of this paper. Instead, we have acknowledged these points explicitly in the revised discussions.

      Portion added to discussions: “As shown in Fig. 3, the chosen values for beta, the additional fitting parameter introduced in the agent-based simulation, are: β = 0.18, 0.13, 0.12 and 0.64 respectively for N = 5, 10, 15, 20. Perhaps it is intriguing that the optimum beta clusters around similar values for N = 5, 10, 15, while the optimum beta for N = 20 is significantly different. While we do not currently have an explanation for why the fitted parameter values are what they are, we note that the fitting curve is flat, implying that several beta values could possibly achieve a satisfactory fit. Further agent-based simulations could explore these findings more systematically, and provide useful insights.”

      Concern 2

      Citation of previous work on dynamical quorum sensing (lines 51 & 52) I think misses two important points: first these works (and others following them) deal with the appearance of collective oscillations at high density (therefore, the same general problem addressed here); second, Taylor et al. studied also a transition where the oscillators involved did not oscillate at low density, whereas above a density threshold, they display coherent collective oscillations whose period decreases with density - similar to what observed here. I do not think this takes anything away from the originality of this work, which refers to a different system, and models it with different equations, but the parallelism between integrate-and-fire dynamics with quenched noise and excitable dynamics in the presence of noise should in my opinion not be overlooked.

      We have explicitly mentioned this in the revised text.

      Concern 3

      As the authors stress in lines 105 and 132, the analytical model shows that all that really matters in this phenomenon is the fastest frequency of the system. This could be used as an argument to say that the actual frequency distribution of individual fireflies is not all that important, as long as their fastest frequency is comparable. The assumption that they are identical would then sound less radical. Ideally, one could use the numerical simulations to check this, as well as the fact that the phenomenon does not break down when the shortest individual interburst interval Tbmin is narrowly distributed (which could also explain why having a few individuals who can flash at a higher frequency does not affect the outcome).

      We thank the reviewer for these observations.

      Concern 4

      I still feel that the agreement between the model and observations is a bit overstated (line 120). At least, I think the authors may stress that whereas the model predicts that the frequency of the 7-14 minutes oscillations should increase a lot with N, this is not observed in the data. Maybe this mismatch would be reduced if inter-individual variability was added.

      Please see the last three paragraphs of the discussion section. In reality, as the swarm size increases, we expect that swarms will no longer be all-to-all connected, and the dynamics of the system will depend upon the speed of propagation of information across the swarm. Precisely how this happens is outside of the scope of the current experimental work and theoretical description presented here.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In the manuscript entitled "Aurora A mediated new phosphorylation of RAD51 is observed in Nuclear Speckles", the authors unveil the Serine S97 as a novel phosphorylation site of the RAD51 recombinase and that this phosphorylation is mediated by the Aurora A kinase using a set of in vitro and in cellulo experiments. The authors also describe this phosphorylation being in the nucleus specifically in nuclear speckles where mRNA maturation and splicing occurs suggesting a role of RAD51 in the latter. The confocal microscopy images provided to test this hypothesis are convincing. However, using confocal images as well, the authors claim that RAD51 phosphorylated at S97 foci do not colocalize with the DNA damage marker -H2AX, hence a function not related to DNA damage, however the data provided does not fully support this statement. In this study, Alaouid et al, utilize mutants of RAD51 that alter S97 phosphorylation to further study its function and provide data that support RAD51 as an RNA binding protein. Overall, the manuscript shows some interesting observations that are worth pursuing however the in vitro and in cellulo results are not aligned, lack some controls, and many points should be reconsidered.

      Major comments:

      • Are the key conclusions convincing?

      Not as stated.

      Fig. 1A. The authors conclude that pS97-RAD51 favors RAD51 strand invasion capacity using the D-loop assay. Indeed, the S97D phosphomimic increased the D-loop activity 2.5-fold compared to WT RAD51. However, the S97A mutant, which is the non-phosphorylated form also increased the D-loop activity by 2-fold compared to WT (figure 1C). So, the phosphorylation or the absence of it seem to promote strand invasion. So, what is the role of the phosphorylation? There is no discussion about this. Besides, no representative image of the D-loop assay is shown, this is very important as these experiments need to be run with the relevant controls to be meaningful.

      Fig. 1D. The polymerization rate of RAD51 is probably irrelevant for its function in the absence of DNA. What do they want to get at with this assay?

      In figure 2B, the authors conclude that RAD51 phosphorylation at S97 is dynamically regulated throughout the cell cycle. Indeed, the pS97-RAD51 is well observed in asynchronous cells, and the double thymidine block time course experiment followed by PI staining shows the oscillation of the pS97-RAD51 from G1 to G2/M stage. The authors quantified the ratio of pS97-RAD51/total RAD51 to conclude this. However, it would be more accurate to also divide the above over the intensity of the loading control (tubulin) because in figure 3A for example, they quantified the ratio of pS97-RAD51/tubulin but did not consider the levels of RAD51 in their quantifications.

      In figure 3B, the authors state that pS97-RAD51 is decreased after CPT treatment and that the pS97-RAD51 foci do not localize with the DNA damage marker -H2AX. The signal of gH2AX is already weird as it does not change from Ctrl to CPT conditions (especially in HCC1806 cells). A pre-extraction of soluble protein with CSK should be used to then look at the co-localization, with the pan-staining of the two signals is difficult to draw any conclusions of colocalization. Nevertheless, the signal of RAD51 seems equal in all conditions in the images shown and it does not seem to be reduced after CPT.

      In figure 4A, the authors show that Aurora A is responsible for the S97-RAD51 phosphorylation in cellulo. Indeed, the use of an Aurora A inhibitor reduces the pS97-RAD51 signal, however, this is only true in one cell line (HCC1806) but no effect was observed in HeLa cells. Is this effect cell-specific?

      The authors find that RAD51 binds both DNA and RNA and measure the affinities of the RAD51 bearing the S97D and S97A mutations. S97D shows the highest affinity for ssDNA and RNA in Fig. 7A, B, however the opposite is true for dsDNA in Fig 7C, D. All three forms of RAD51 bind RNA although with different affinities however no error bars are shown. The description of the results does not seem accurate. Importantly, these data should somehow correlate/be discussed with respect to the D-loop assay performed in Fig. 1. The authors conclude that the binding to RNA is reduced in S97D-RAD51 suggesting that the pRAD51 that they observe at nuclear speckles would be probably not associated with RNA at these nuclear speckles, right? this goes against their idea of this phosphorylated form being related to RNA splicing... - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      The manuscript seems to be in early days and requires lots of editing, rewriting to relate the in vitro and in cell data and make a coherent story - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The authors performed chromatin fractionation to determine the correct localization of the pS97-RAD51 and looked for the phosphorylated form by western blots. But then they confirmed the finding using immunofluorescence. I think it would be more convincing and consistent if the authors do a pre-extraction before the use the antibody because as such, they would be indeed confirming the localization of the protein they are looking at that is specifically in the nucleus.

      As well, in order to test the specificity of the pS97-RAD51 antibody they generated, a simple treatment of the lysates with phosphatases would be a good control for the specificity of their antibody These and the critics mentioned above need to be address. - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      This manuscript is not ready for submission - Are the data and the methods presented in such a way that they can be reproduced?

      Yes. However, the legends of the images are way too concise. - Are the experiments adequately replicated and statistical analysis adequate?

      In Fig. 2B, the authors performed a double thymidine block followed by a time course release to track cell cycle progression of the cells and phosphorylation of RAD51 at S97. They do not indicate the biological replicates they performed. There are no error bars in the estimated KD shown in Fig.7.

      Minor comments:

      • Specific experimental issues that are easily addressable.

      The authors conclude that the S97 is specifically phosphorylated by the Aurora A kinase. How? Have they looked at other documented kinases known to phosphorylate RAD51?

      In figure 6 the authors overexpress HA-tagged RAD51 proteins corresponding to WT, S97D and S97A mutants in cells and label them for immunofluorescence. Maybe it would be better to downregulate the endogenous RAD51 to discard possible combined effects.

      In figure - Are prior studies referenced appropriately?

      The authors show in their manuscript that RAD51 protein CAN interact with RNA in vitro, a finding not previously described to my knowledge. However, a recent study entitled "RAD51-dependent recruitment of TERRA lncRNA to telomeres through R-loops, Nature, 2020" provides in vitro data showing the binding of RAD51 to TERRA, a LncRNA, which I think would be worth mentioning their manuscript.

      The authors should mention previous contributions in the field especially when it comes to RAD51 in the HR pathway post DNA damage, which is quite documented and updated. For example, in this section of the introduction, "RAD51 is a recombinase protein implicated in the strand exchange mechanism during the DSB repair by the Homologous Recombination (HR) pathway. In the absence of DNA Damage (DD), RAD51 is predominantly cytoplasmic and translocates to the nucleus during the DNA Damage Response (DDR) to manage HR repair. As it needs the undamaged sister chromatid as a template, the HR repair pathway occurs mainly in the late S, G2 phases of the cell cycle. However, it has been documented that HR repair can also occur during G1 and early S phases, and in this case, the undamaged template used for the repair could be the homologous chromosome or an RNA transcript2". This statement is definitely worth more references.

      The same problem is recurrent in the rest of the introduction; therefore, it needs to be updated and better referenced. - Are the text and figures clear and accurate?

      The text needs a lot of editing to accurately describe the results, see for example: "The resulting KD evaluation shows that the S97D mutant had a dsDNA binding affinity lower to that of the WT (a KD of 2.26 μM for the S97D-RAD51 vs a KD of 0.38 μM for the WT RAD51). Concerning, the S97A mutant comparison to the WT RAD51, we observed modified association and dissociation curves that resulted in an identical affinity to dsDNA (a KD of 0.33 μM for the S97A-RAD51 vs a KD of 0.38 μM for the WT RAD51). We can conclude that in our in vitro conditions, the Ser97 phosphorylation has a high impact on RAD51 affinity for DNA by dividing its affinity by 5.8." Besides, the figures are of low quality and should be more carefully crafted and presented. Some experiments (such as the D-loop) are not represented in the figures.<br /> - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Using a different representation for the graphs would be a plus (also see previous comments)

      Referees cross-commenting

      I think the other reviewers and I have raised very important and complementary points that will help the authors improve the quality of the manuscript substantially.

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      The discovery of a new phosphorylation site in RAD51 (S97) by Aurora A is potentially interesting for the field of the maintenance of genome stability as it could broaden the understanding of how such an important recombinase may be regulating the maintenance of genome integrity throughout the cell cycle. Also, the idea of RAD51 being involved in splicing and mRNA maturation seems very attractive and a very important conceptual advance. However, given the premature status of the text and the figures, the manuscript falls short to show convincing evidence. - Place the work in the context of the existing literature (provide references, where appropriate).

      Many works are highlighting the role of RNA binding proteins as an integral part of the DNA damage response. In addition, a wealth of evidence in the literature suggest that many DNA repair proteins are RNA binding proteins, and that RNA is an important player in the DDR. The possible finding that RAD51 interacts with RNA and localize to nuclear speckles possibly acting in splicing is very interesting and attractive. How is Aurora A involved in this, what is the trigger, and whether RAD51 is binding RNA at these sites is still unclear. - State what audience might be interested in and influenced by the reported findings.

      Labs working in genome integrity mechanisms and the crosstalk between transcription and DNA repair would be interested. - Define your field of expertise with a few keywords to help the authors contextualize your point of view.

      Genome Instability, homologous recombination

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the Reviewers for their careful reading and the many thoughtful suggestions to improve our manuscript, as well as both the Editors and Reviewers for the generally positive evaluations and encouraging statements.

      Editorial assessment:

      This important work presents an interesting perspective for the generation and interpretation of phase precession in the hippocampal formation. Through numerical simula- tions and comparison to experiments, the study provides solid evidence for the role of the DG-CA3 loop in generating theta-time scale correlations and sequences, which would be reinforced through the clarification of the concepts introduced in the study, in particular the notion of intrinsic and extrinsic sequences. This study will be of interest for the hippocampus and neural coding fields.

      We appreciate that our work has been considered important. In our revision we made a considerable effort to improve on the presentation of our results and the justification of our model assumptions. Particularly we aimed to clarify the meaning of intrinsic and extrinsic sequences by ad- ditional figure panels as well as fleshing out their definition via spike-timing correlations being independent or dependent on the direction of the running trajectory, respectively. To address all the requests, we added 3 new Fig- ures, multiple new Figure panels and simulated a new model variant.

      Reviewer #1 in their public review assessed ”The manuscript has the potential to contribute to the way we interpret hippocampal temporal coding for navigation and memory.”

      They criticized

      • The findings generally relate to network models of phase precession (re- viewed in e.g., Maurer and McNaughton, 2007, Jaramillo and Kempter, 2017). An important drawback of these models with respect to explaining specific experimentally observed features of phase precession, is that they cannot straightforwardly explain phase precession upon first exposure onto a novel track. This is because, specific connectivity in network models may re- quire experience-dependent plasticity, which would not be possible upon first exposure. This is essential, given that the manuscript addresses the possible origin of phase precession in terms of network models and at minimum, this weakness should be discussed.

      We agree with Reviewer # 1 (and also with Reviewer # 2, who brought up a similar point) that models based on recurrence struggle to ex- plain how the recurrent connectivity matrix should come about. While we feel that a full model of how the 2-d topology in the recurrent weights can be learned goes far beyond the scope of this paper (and to our knowledge has not been solved so far in any existing model), we added a new model variant (new Figure 6 and Supplementary Figure 1), which explains the ba- sic phenomenology of extrinsic and intrinsic sequences without the need of recurrent connections, only using feed-forward synaptic facilitation. Thus, assuming recurrent connection is not necessary for our main findings. How- ever, we would like to point out that this does not exclude the possibility that recurrent connections, if set up in an appropriate way, also contribute to phase precession and theta sequences.

      • An important and perhaps essential component of the manuscript, is the distinction between extrinsic and intrinsic models. However, the main con- cepts on which this hinges, namely extrinsic and intrinsic sequences (and the related extrinsicity and intrinsicity) could be better explained and illustrated. Along these lines, the result suggested by the title, namely, hippocampal theta correlations, may be important yet incidental in light of the new concepts (e.g., extrinsicity, intrinsicity) and computational models (e.g., DG-CA3 recurrent loop) that are put forward.

      We have added substantial new explanatory material to the figures, captions and text to more didactically introduce the concepts of in- trinsicity and extrinsicity. We have also completely rewritten the abstract and added a subtitle: ”extrinsic and intrinsic sequences”

      • The study seems to put forward novel computational ideas related to neural coding. However, assessing novelty is challenging as this manuscript builds on previous work from the authors, including published (Leibold, 2020, Yiu et al., 2022) and unpublished (Ahmadi et al., 2022. bioRxiv) work. For example, the interpretation of intrinsic sequences in terms of landmarks had been introduced in Leibold, 2020.

      We agree with the reviewer that this paper touches on many related ideas from previous papers (not only of our lab) and is supposed to tie loose ends. Thus, the novel contribution is a biologically plausible mechanistic model of how intrinsic sequences and 2-d place maps interact on the level of interconnected spiking neurons. Such a level of explanation has not yet been available in previous work. We have considerably extended the Discussion section in our revision detailing the bigger picture underlying this theory. Also our addition of the non-recurrent model variant (see above) adds considerable novelty, since it provides an account of phase precession and preplay in novel environments.

      • The significance of the readout tempotron neuron could be expanded on. In particular, there is room for interpretation of the output signal of that neuron (e.g., what is the significance of other neurons downstream? Why is the rationale for this output to being theta-modulated?)

      We have added an additional Figure 8 to better illustrate the inner workings of the tempotron. We also extended the discussion to better explain the potential use of the tempotron output (see above). In short, we consider the tempotron to signal a unique behaviorally important context that is independent of remapping induced by changes of sensory cues, which is a new prediction of the model. Since the context signal is resulting from DG loops it requires a stable code to also exits in the DG. Evidence for such long-term stability in DG has been found in Hainmu¨ller & Bartos (2018).

      Reviewer #2 in their public review find ”this research topic to be both important and interesting” and appreciates ”the clarity of the paper.”, com- mending our ”efforts to integrate previous theories into their model and con- duct a systematic comparison”.

      We are very happy about these positive remarks and sincerely would like to thank the reviewer!

      Reviewer #1 made the following specific recommendations for changes:

      The abstract is somewhat difficult to parse. I have identified some words and/or sections that could be improved.

      • ’ ....inherently 1 dimensional’. This statement seems to be related to an a priori interpretation of the authors. On the other hand, if offline sequences are trivially 1 dimensional because they are sequences (i.e., they constitute a vector), then online sequences would be 1-dimensional as well. What is the key difference between offline and online? Is it the omnidirectional place fields in two dimensions? Perhaps more importantly, how relevant is this fact with respect to the main results of the manuscript, which concern ex- trinsic and intrinsic sequences?

      We indeed meant that the sequences are trivially 1-dimensional. The main challenge that we would like to address in this paper is how a 2-d topology of place cells (and direction dependent theta sequences) and a 1-d sequence topology of intrinsic theta correlations and during (p)replay can be reconciled. We hope this has become clearer in the rewritten abstract.

      • The language in lines 36-38 is overly technical. I suggest modifying the language, the language was less technical and more understandable in the body of the manuscript, which should be also reflected in the Abstract.

      We would would like to apologize for making the abstract too technical. Also in response to Reviewer #2, we decided to rewrite the ab- stract entirely.

      The authors use a mixture of conductance based models and Izhikevich neurons, presumably for the spiking generating mechanism. The conductance component can be readily interpreted in terms of the underlying biophysics. The Izhikhevich neuron model, however, is phenomenological. I suggest you address i) the rationale for using Izhikevich model, 2) its biophysical inter- pretation, 3) and its combination with conductance-based currents.

      The reviewer is correct that spike generation is modelled using Izhikevich’s model whereas synaptic integration is included in a conductance- based manner. As suggested by the reviewer, we have added further expla- nation in the Methods part, explaining that the Izhikevich approach allows to adjust burst firing properties with only few parameters by efficiently em- ulating the bifurcation structure of spike generation in the full biophysical model (1&2) and otherwise has no effect on the integration of conductance- based synaptic currents in a subthreshold regime (3).

      Line 126: when you say preferred angle, do you mean preferred (heading) direction? If so, please maintain consistency throughout.

      We thank the reviewer for pointing out the inconsistency. We have added the word ”heading” throughout the manuscript whenever ap- propriate. To further improve the consistency, we have clarified the meanings of ”best” (or ”worst”) direction and reserved the use of it solely for cases when trajectory direction is compared with the preferred heading direction, namely, ”best” (”worst”) direction when trajectory is along (opposite) the preferred heading direction.

      Line 174: When discussing cross-correlation, sometimes you mean a cross-correlation function between two place fields and sometimes to the his- togram of all such correlations? Please clarify.

      We used histograms to empirically estimate the underlying cross-correlation function. For clarity, we have specified that it is a cross- correlation histogram in the revised manuscript whenever we refer to the empirical estimate.

      Figure 3:

      Understanding the difference between extrinsic and intrinsic sequences is fundamental for the manuscript. I suggest that in the section that refers to Figure 3 (or Figure 3 itself), you kindly provide an example depicting how extrinsic and intrinsic sequences can

      1) coexist yet be distinctly identified

      2) depend on trajectory

      3) depend on DG input

      By coexistence, we meant the heterogeneous population of ex- trinsic and intrinsic cell pairs and, hence, the extrinsic and intrinsic theta correlations, as shown in Figure 3J. To improve the clarity, we added the following sentence in the section that refers to Figure 3: ”In our simula- tion, extrinsically and intrinsically driven cell pairs are both present in the population (Figure 3J), indicating a coexistence of extrinsic and intrinsic sequences.”. To illustrate how extrinsic and intrinsic sequences depend on both tra- jectory and DG recurrence, we have also added annotations in Figure 3F to mark the extrinsic and intrinsic part of the sequence.

      Moreover, the caption of Figure 3 refers to the directionality of the theta sequences. How does this again relate to the extrinsic/intrinsic distinction?

      We hope the highlighting in panel F of Figure 3 has resolved this problem.

      Figure 5:

      • This is a crucial figure that should illustrate the differences between extrinsic and intrinsic sequences, as the figure caption suggests. Surprisingly, it is not at all clear where (i.e., in which panel) and how (i.e., methodologi- cally) should one distinguish one type of sequence from another. I suggest that at least one such panel is dedicated to illustrating the difference and/or detection of these sequences in time and/or from phase precession plots. Moreover, there is significant visual crowding that makes the interpretation challenging (e.g., insert a space between G and E)

      We would like to apologize that in the previous version of the manuscript, we seemed to have evoked the impression that the difference between intrinsic and extrinsic sequences should be mainly illustrated in Figure 5. We hope that our revisions of Figures 1 and 3 have made it sufficiently clear to this point. The main purpose of Figure 5 was (and is) to illustrate how intrinsic sequences can lead to out-of-field firing. We have modified the figure caption (and text) accordingly. To address the visual crowding problem in Figure 5, we have inserted a space between panels and also removed repeated labels.

      Tempotron neuron and Figure 6:

      From the reviewer’s questions on Figure 6, we feel that our presentation caused considerable confusion about the motivation and inter- pretation of the tempotron simulations. We therefore rewrote parts of the associated text and Figure caption. We hope that the revised presentation clarifies the issues. We therefore only briefly respond to the reviewer’s points here, because we think they largely resulted from misunderstandings.

      • Intuitively, and as the manuscript results suggest, late phases are asso- ciated to extrinsic mechanisms while early phases are associated to intrinsic. Why not construct a simpler classifier readout based on this fact? How does it compare to a tempotron?

      Opposite to the reviewer’s comment, extrinsic mechanisms are visible at early phases (late in the field), intrinsic mechanisms at late phases (early in the field). In fact, what the tempotron does is learning to identify the intrinsic (late phase) part and to disregard the extrinsic (early phase) part.

      • What is the significance of theta-modulated output of the tempotron (readout) neuron?

      The theta modulation of the tempotron output is a trivial re- sult of the theta-modulation of the input, i.e., the detection of the intrinsic sequence pattern is done once every cycle.

      Suggestion for Figure 6 related to Tempotron readout: Focus on ’with DG loop condition’, as the challenge and most important point here is to identify extrinsic and intrinsic sequences. The No-loop condition could be left as a supplementary figure or side panel.

      The no-loop condition is the essential control showing that the tempotron only responds to the previously learned intrinsic pattern and can- not identify spatial location based on the extrinsic pattern.

      Further work/predictions.

      Lines 196-198. ”Since intrinsic sequences can also propagate outside the trajectory (Figure 5) and activate place cells non-locally, our model predicts direction-dependent expansion of place fields.” If remote activation is ’suffi- ciently’ remote, wouldn’t this predict two separate place fields instead of an expansion?

      The reviewer is completely correct. Out of field spiking can be also affecting remote locations, if the intrinsic sequences link to remote place fields. This would lead to double fields, however, the intrinsic part would only be active at late theta phases. For simplicity, we have not added such a case in our paper, but we would like to thank the reviewer for this comment, since it leads to a nice prediction of the model, which can be experimentally tested and therefore was included to the discussion.

      Lines 556-558. ”In our model, firing rate is determined by both low-phase spiking from sensory input and high-phase spike arrivals of DG-CA3 loops, both producing opposing effects on the phase distribution.” Is it possible to make a differential prediction based on lesions here, e.g., along the lines of reduced range phase precession, for either high phases or for low phases?

      We thank the reviewer for this great suggestion. Lesion of DG in the model does indeed reduce the phase range and mean spike phase. This further corroborates the effect of DG-loop on theta compression and high-phase spiking. We have included a new panel D in Figure 4 and a corresponding mention in the result section.

      Line 570. ”We speculate that the functional roles of intrinsic sequences may not be limited to spatial memories.”. Is there any relationship to re- play and/or sleep-dependent memory consolidation? Some speculation in the Discussion section would be welcome and appropriate.

      We have added some further speculative ideas to the last section of the Discussion. We propose that replay and preplay reflects the intrinsic sequences that express the current expectation of the animal. We have not yet thought well enough about their relation to memory consolidation to phrase this in the manuscript, but would suggest that they could serve to signal multimodal context information to the neocortex where it can evoke retrieval of unimodal memory traces.

      The description of the results, as stated in the public review, can be im- proved. A key component is the definition and identification of extrinsic and intrinsic sequences.

      Some comments:

      • I think that the words ’extrinsic’ and ’intrinsic’ are problematic as both types of sequences/models rely on external (spatial) input, hence both are in some sense ’extrinsic’. On the other hand, both are network mechanisms, thus in some sense ’intrinsic’, where the asymmetry is either programmed directly onto the weights or due to synaptic depression. To add to the con- fusion, ’intrinsic’ mechanisms very often refer to cellular mechanisms in neurophysiology. I kindly ask you to, ideally, reconsider the terminology, or at the very least, be very thorough and precise when describing the mech- anisms. For example, sometimes extrinsic (intrinsic) ’models’ are referred to, sometimes ’sequences’, sometimes ’factors’, sometimes ’pairs’, etc.

      We understand and appreciate the reviewers argument, but would like to stick to the terminology, since it was already used in our prior publication. We have made considerable effort to improve the explanation and illustration of extrinsic vs. intrinsic pairs in the main text, Figure 1 and 3 to highlight our definition that is based on pair correlations: Extrin- sic pairs flip the correlation lag with reversal of running direction, intrinsic pairs don’t. This is simply a functional definition and should not be con- fused with potential microscopic mechanisms. One of those (DG-loops) is suggested in our paper.

      • As discussed in the public review, network mechanisms may require experience-dependent plasticity and hence cannot easily explain phase pre- cession on the first pass. Please discuss why and/or how your model fits with this observation.

      We agree that the two models under consideration both require the recurrent network be set up appropriately and there is no theory so far that would explain how. The reason we chose these two models is because they are well known in the community and relatively similar. We reasoned that comparison between an intrinsic model and an extrinsic model would make most sense if the two are a similar as possible. Nevertheless, we ex- tended the manuscript by a new set of simulations in which we do not use re- current CA3 connections and obtain phase precession solely be feed-forward synaptic facilitation (new Figure 6 and supplementary Figure S1). The new simulations show that the basic phenomenology can also be obtained with- out using recurrent CA3 connections, however, as expected when removing one mechanisms of phase precession, the range of phase range is somewhat reduced as compared to the full model.

      Along a similar vein, phase precession in Figure 1E only has a range of pi/2, which is about half of the typical range of phase precession for single runs. This should be characterized as a weakness of the intrinsic model.

      The precession range in spiking models is highly sensitive to a large number of parameters such that it is hard to make such definite claims (see also above response). In the original Tsodyks et al. 1996 paper the phase range went up to 270 degrees with a slightly different implementation to ours in terms of current vs. conductance-based synapses, an exponen- tial instead of a Gaussian recurrent weight function, and 1-d (original) vs 2-d (ours). We chose conductance-based synapses, and a Gaussian weight profile for better comparison with the Romani and Tsodyks (2015) model. In the original non-spiking implementation by Romani and Tsodyks (2015), the phase range was hardly 70 degrees. Our model implementation of the Romani and Tsodyks (2015) model fits the experimentally reported phase ranges of about 70 to 180 degrees in CA3 (Harris et al., 2001).

      Lines 282-284: ”...since phase precession properties change in relation to running directions, nor are they solely intrinsic since reversal of correlation is still observed in most of the sequences (Huxter et al., 2008; Yiu et al., 2022).”. To which extent is this a consequence of the phase precession model (extrinsic vs intrinsic) or the fact that place fields are sometimes directional?

      The reversal of sequences with reversed running direction is how we define extrinsic correlation. We hope our changes in relation to Figure 1 has clarified this point.

      Figure 2: Is it i) directional input or ii) short-term facilitation that gives rise to lower phase? (or perhaps both?) Please clarify.

      It’s both. This is now clarified in the revised version of the Re- sults sections related to Figure 2: higher depolarization always yields earlier phases in spiking models, however, pair correlations are not affected by ei- ther of the two mechanisms.

      Line 320. ”...onset of phase precession”. Do you mean in CA3/CA1/DG?

      Thank you for pointing this out. We have clarified that this statement refers to CA3.

      Line 323. ”....at a different location”. Please add rationale why it has to be at a different location and a reference to the appropriate equation.

      The sequence rationale as well as the equation number have been added.

      Line 384. ” ... predicting that loss of DG inputs is compensated for by the increase of release probability in the spared afferent synapses from the MEC.”. It wasn’t clear whether this was a ’homeostasis prediction’, or and implementation in the model. Please clarify.

      Since the model explained the experimental observations by implementing an increased probability of release, the model predicts that in animals with DG lesion the probability of release should be enhanced. We have modified the wording to avoid confusion.

      Line 428 ”...and near future locations) is obvious, the potential role of the lesser expressed intrinsic sequence contributions is not straightforward.”. Similar to my comments above regarding terminology, please clarify what are both contributions and why are intrinsic sequences ’lesser expressed’.

      We have rewritten this passage to avoid unclear wording.

      Line 474. ”...we showed that the trajectory-independent sequences”. Do you mean ’intrinsic sequences’?

      We thank the reviewer for careful reading! We have changed the wording ”intrinsic sequences” in the revision.

      Line 482. ”...field pairs being extrinsic”. Please clarify, as the usage of extrinsic now refers to field pairs.

      Thank you for pointing this out. We went through the whole manuscript and clarified the terms.

      Line 245 (heading). Consider rewriting as ’Dependence of theta se- quences on heading directions’. Extrinsic and Intrinsic models have not yet been introduced.

      Since the main purpose of the first Results section is to explain the difference between extrinsic and intrinsic sequences we kept these terms in the heading but modified it to ”Dependence of theta sequences on head- ing directions: Extrinsic and intrinsic sequences”. Additionally, we have put more emphasis on introducing the terms ”extrinsic” and ”intrinsic” in this section.

      Figure 1.

      • I suggest using the same font - C and D, and F and G are too close to each other, consider adding space. For example, the exponent, 10-2 makes reading cumbersome. Line 300. Phase tail means offset phase? Phase tail may be too informal. Line 325: DG loop. Do you mean CA3-DG projection?

      We thank the reviewer for the suggestions. In the revised manuscript, we have ensured that the same font is used in all of the fig- ures. To improve the readability of Figure 1, we have added space between panels as suggested, removed repeated axis label and downsized the text ”10-2”. Furthermore, we have rewritten the referenced line without using the word ”tail”, and also, clarified the meaning of DG loop as the short form of CA3-DG projection.

      Figure 4 caption: ”DG lesion reduces temporal correlations...”. It is more precise to say that the lesion reduces the slope of the fitted lag vs dis- tance. And how is this related to sequence compression?

      In the paragraph referring to Figure 4, we have elaborated on the meaning of theta compression and its relation with the the lag-distance plot. However, we argue that ”reduces the slope of the fitted curve” is not comprehensive enough to express our summarized conclusion in a caption title. We have modified the wording to be ”DG lesion reduces theta compression”.

      In addition, we have changed the slope unit to be radians per cm rather than radians per maximum pair distance, in conformity to unit standards.

      General comment about terminology with regards to tuning and connec- tivity: it is not formally correct to compare connectivity with trajectories (e.g., lines 388-395, caption of Figure 5A, etc). Perhaps compare tuning to particular directions/preference or receptive field?

      We have corrected the wording such that the direction of DG- loop projection is compared to the direction of trajectory.

      Line 470. ’...fixed recursive loop.” Sentence is not clear, do you mean recurrent loops?

      The reviewer is correct. We corrected the wording

      Reviewer #2 had the following recommendations.

      M1. The abstract focuses on the differences between online and offline hippocampal replays. However, the replay topic is not touched upon in the rest of the manuscript. I found this very confusing when I first read the pa- per. I suggest the authors reconsider the best way to approach the opening or at least discuss if and how their model would incorporate replay phenomena.

      Also in response to reviewer #1 we have rewritten the abstract focusing on the problem of how to generate 2-d topology from 1-d sequences. In addition, also in response to Reviewer#1 we added a paragraph in the discussion detailing a hypothesis on how er think replay and intrinsic se- quences work together.

      m2. On lines 89-91, the authors provide the selection of neuronal pa- rameters for excitatory pyramidal cells and inhibitory cells in the Izhikevich model. While the choice of model is reasonable, it would be helpful to clarify the source of these neuronal parameters, especially for readers who are not familiar with the model.

      Again, also in response to reviewer # 1, we have added more motivation for the Izhikevich model.

      M3. On lines 94-98, the model considers a 2D sheet of CA3 neurons. One of the most significant assumptions is that each 2x2 tile of place cells is considered a unit with four directional angles. What is the basis for this assumption? Is there any experimental result supporting this, or is it a completely artificial design for the model? This is important since the or- ganization of CA3 cells also affects the network architecture discussed later and impacts the realism of the model.

      This comment is related to Reviewer #1’s concern on experience- dependent plasticity: How is this connectivity pattern established? We fully agree that this is an open problem for the Tsodyks et al.-type networks. The main reason for choosing them (as argued in our response to reviewer #1) is to have two published models, representing one type of sequence each, that are similar enough for comparison. In addition, we added new simulations (new Figure 6 and Supplementary Figure S1), showing that the basic phe- nomenology can also be obtained in a model without recurrent connections (see also response to Reviewer # 1)

      m4. Similarly, on lines 111 and 140, the model uses 500 ms for the timescales of short facilitation and short-term synaptic depression. The choices of these two timescales are vital for producing directionality in extrin- sic and intrinsic sequences, yet their experimental sources are not clarified.

      In the Methods section of the revised manuscript, we have in- cluded the sources of previous experimental data and modelling work to support our choice of the time constants.

      M5. On line 126, the authors assume that the synaptic strengths be- tween CA3 cells, Wij, are given by the distances between neurons and the similarity between their directional preferences. While this assumption seems reasonable in the sensory cortex, I am unsure if this is also the case in the hippocampus, and the authors should clarify the basis for this assumption.

      The distance dependence simply reflects the original Romani and Tsodyks 2015 model (see response to M3) and we share the concern of the reviewers. The increased connectivity for neurons with the same di- rectional preference was necessary to recover the direction dependent phase precession properties (Figure 2) in the realm of the Romani and Tsodyks 2015 model. Please also see our new Figure 6 showing simulations without the recurrent matrix.

      More importantly, the existing connections within CA3 and DG cells completely determine the ”intrinsic” sequences. But wouldn’t this be fragile when place cells undergo global remapping, which can take place within only a few seconds? The author should comment on this in the discussion.

      We would like to thank the reviewer for bringing up this inter- esting point. In our thinking, the DG-CA3 connectivity is fixed (multiple 1-d trajectories, not necessarily requiring 2-d topology), i.e., the same in- trinsic sequence should show up in multiple environments (and should not remap), although it may just not be active in some environments). This is a prediction of our model and we have added it to the Discussion.

      M6. I found the setup of DG place cells unreasonable. DG place cells are found to be granule cells rather than pyramidal cells. Moreover, the model does not consider recurrent connections between DG cells (These setups are closer to CA1 place cells).

      We agree with the reviewer, DG granule cells should rather be modelled as high-input resistance EIF neurons. However, the feedback loop via the dentate is not a direct one. It involves hilar mossy cells plus multiple hierarchies of feedback inhibition (this is probably what the reviewer means with recurrent connections between DG neurons, because granule cells are not recurrently connected in the non-pathological state). To our knowledge a biologically realistic model of the hilar-DG network does not exist and it would be far beyond the scope of this paper to develop one. We therefore see our DG feedback model rather as phenomenological. The discussion paragraph on the anatomy of the dentate gyrus touches on these points.

      Therefore, a significant concern is: Why should it be the DG feedback projection to CA3 responsible for the ”intrinsic” sequences instead of pro- jections from other brain areas?

      The reviewer is generally correct, any brain structure which im- plements fixed sequences via a loop would do. The reason why we suggest the DG to be the best candidate is purely empirical referring to papers with dentate lesions: Sasaki et al. 2018 and Ahmadi et a. 2022. We have added a similar argument to the discussion.

      m7. On line 166, the authors claim that there are no connections between inhibitory cells at all. While I understand that this is for simplification of the model, the lack of recurrent inhibition between interneurons may have limited the model’s ability to produce gamma-band dynamics (referring to PING and ING mechanisms), which are robust rhythms produced in CA3. I am very curious if the model can incorporate theta-gamma coupling by in- troducing connections between CA3 inhibitory cells.

      We have omitted the gamma oscillation for simplicity, because we do not have a hypothesis for a functional role in the context of dis- tinguishing extrinsic from intrinsic sequences (Occam’s razor) and, as the reviewer correctly anticipates, they unavoidably show up when inhibitory in- terneurons connect to each other (e.g. Thurley et al. 2013). Of course, one could envision situations in which gamma for intrinsic sequences my have different frequency than for extrinsic ones, by differentially manipulating the CA3 and DG basket cell networks, but, as long as there is no experimental data, it would be pure speculation and thus we have not included it in the model.

      m8. The authors should clarify the source of parameters in Table 1, especially the synaptic strengths. These values are vital for extrinsic and intrinsic theta sequences.

      The weight values have been chosen to allow for large theta phase precession range, coexistence of extrinsic and intrinsic sequences, and stability of the network activity. A similar statement has been added to the manuscript.

      M9. I have another concern regarding the measurements of ”extrinsic- ity” and ”intrinsicity” defined on lines 185-196. Are they the best measures? To distinguish the cause of spike correlations, the ”extrinsicity” and ”intrin- sicity” of a pair of spikes should not be high at the same time. However, this is clearly not the case in the model, according to Figs 3 and 5. Moreover, in the data analysis carried out later, spike pairs are considered extrinsic or intrinsic merely by comparing the two measurements. I suggest the authors consider counterfactual methods in causal inference. For example, would a spike pair (cell1, cell2) still exist if we change the sensorimotor inputs or the DG-CA3 projections? If this is difficult to implement, the authors should at least discuss how different choices of measurements would impact the con- clusions of the paper.

      The problem the reviewer has identified arises from the funda- mental symmetry of theta phase quantification: if spikes of a pair of place fields have a phase difference of 180◦ one cannot say which cell leads and which cell follows, hence, the phase difference is both intrinsic (because the peak doesn’t flip) and extrinsic (because the peak flips and ends up at the same phase). The fact that in some cases extrinsicity as well as intrinsicity are high simply means that the field pair has a correlation peak lag close to 180◦. Since in the experimental data set in (Yiu et al. 2022) only field pairs were available, we have not been able to use a different quantification then and decided to apply the same quantification in our model for comparison. Moreover, Figure 5F nicely shows that the measures are able to retrieve the ground-truth intrinsic DG-loop structure when considered on the population level.

      In our model, though, we can go beyond 2-nd order statistics and derive sequence similarity measures including multiple cells, e.g., Chenani et al. 2019. However, since, we already know the ground truth by construction, we decided to not use these methods. We added a paragraph in the discus- sion elaborating on beyond 2nd order sequence quantification.

      m10. The authors begin discussing ”intrinsic sequences” from line 316. However, it is not defined before that (and in the rest of the paper as well), causing confusion when reading the paper. The exact definitions of extrinsic and intrinsic sequences should come earlier.

      We hope that our changes to the beginning of the results section (Figure 1), also asked for by Reviewer # 1 could clarify the confusion.

      m11. On lines 345-347, the authors claim that ”the intrinsic sequences are played out backward as determined by the direction of fixed recurrence (Figure 3F),” which is vague. If such sequences are present in that panel, it should be more explicitly indicated graphically.

      Also in response to Reviewer #1, we have graphically high- lighted the two types of sequences.

      M12. On lines 309, 356, 484, 495, 515, and possibly other instances, the authors repeatedly claim that the model simulations are in ”quantitative agreement” with their previous experimental paper. However, no experimen- tal data or comparison with the simulations are presented in this paper. The authors should at least create one figure to demonstrate the degree of consistency between them, instead of merely asking the reader to refer back to their previous paper.

      We agree with the reviewer that the experimental data of our previous paper should be presented in the manuscript. However, creating more panels or figures is likely to clutter the already crowded visuals and ob- scure our main message. We therefore decided to give numerical comparisons the previous findings in the main text whenever appropriate, specifically, in the sections referring to Figures 2, 3 and in the Discussion.

    1. Author response

      Reviewer #1 (Public Review):

      The potential role of the CaMKII holoenzyme in synaptic information processing, storage, and spread has fascinated neuroscientists ever since it has been described that self-phosphorylation of CaMKII at T286 (pT286) can maintain the kinase in an activated state beyond the initial Ca2+ stimulus that induced kinase activation and pT286. The current study by Lučić et al utilizes biochemical and biophysical methods to re-examine two pT286 mechanisms and finds:

      (1) that a previously proposed activation-induced subunit exchange within the holoenzyme can not provide pT286 maintenance or propagation; and

      (2) that pT286 can occur not only within a holoenzyme but also between two holoenzymes, at least at sufficiently high concentrations.

      For the observation regarding the subunit exchange, the authors go above and beyond to demonstrate that a previously proposed activation-induced subunit exchange does not actually occur in their hands and that the previous appearance of such a subunit exchange may instead be due to activation-induced interactions between the kinase domains of separate holoenzymes. This provides important clarification, as the imagination about the possible functions of this subunit exchange has been running wild in the literature.

      By contrast, pT286 between holoenzymes at sufficiently high concentrations was largely predicted by the previously reported concentration-dependence of pT286 between monomeric truncated CaMKII (although these previous experiments did not rule out that such pT286 could have been excluded for intact full-length holoenzymes). Notably, the reaction rate reported here for pT286 between two holoenzymes is more than two orders of magnitude slower compared to the previously described rate of the pT286 reaction within a holoenzyme.

      The only point on which we disagree (and we think it’s unarguable) is that the current consensus is that inter-holoenzyme phosphorylation simply doesn’t happen (whether or not monomers can phosphorylate each other). The reviewer is of course right that this view seems now less and less likely. We now performed new experiments to investigate this critical point further (see below).

      The probable reason for the discrepancy in reported half-time of phosphorylation measured in earlier reports and in our paper is the fact that earlier reports (for example Bradshaw et al., 2002) measured autophosphorylation rate of wild-type CaMKII holoenzymes, at catalytically-competent enzyme concentrations of 0.1-5 µM. We are reporting the phosphorylation rate of 4 µM kinase-dead CaMKII, which is only a substrate, by 10 nM catalytically competent enzyme (CaMKII wild-type). There is up to 500 times less catalytically competent enzyme in our reactions, which is probably the reason why the reaction itself is several orders of magnitude slower.

      In summary, this study contains two somewhat disparate parts: (1) one technical tour-de-force to provide evidence that argues against activation-induced subunit exchange, which was a tremendous effort that provides influential novel information, and (2) another set of experiments showing the somewhat predictable potential for pT286 between holoenzymes, but without indication for the functional relevance of this rather slow reaction. Unfortunately, in the current/initial title of the manuscript, the authors chose to emphasize the weaker part of their findings.

      We agree with the reviewer that the title should be modified to emphasize both findings of our study. We also hope that our new experiments do bolster our findings with regard to pT286 between holoenzymes, as the reviewer puts it.

      The seemingly slow inter-holoenzyme phosphorylation is only slow under conditions in which one of the proteins is kinase-dead. In situation in which all CaMKII holoenzymes are wild-type and therefore capable of performing phosphorylation (both intra- and inter-holoenzyme) the reaction rates for pT286 are expected to be orders of magnitudes faster, than those reported here for the phosphorylation of T286 on kinase-dead protein.

      Reviewer #2 (Public Review):

      This well-written manuscript provides a technical tour-de-force to provide a novel mechanism for sustaining CaMKII autophosphorylation through an interholoenzyme reaction mechanism the authors term inter-holoenzyme phosphorylation (IHP). The authors use molecular engineering to create designer molecules that permit detailed testing of the proposed interholoenzyme reaction mechanism. By catalytically inactivating one population of enzymes, they show using standard assays that the inactive enzyme can be phosphorylated by active holoenzymes. They go on to show that in cells, the inactive enzyme is phosphorylated only in the presence of co-expressed active CaMKII and that this does not appear to be due to active and inactive subunits mixing within the same holoenzyme. The authors suggest reasons for why previous experiments failed to expose IHP and in some experiments provide evidence that reproduces and then extends earlier studies. Some noted differences from earlier experiments are the reaction temperature, the time course of the reactions, and that significantly higher concentrations of the inactive (substrate) kinase in the present study amplify the IHP. These are plausible reasons for earlier studies not finding significant evidence for IHP and the presented data is well-controlled and of high quality.

      The authors then take on the idea of subunit exchange employing multiple strategies. Using genetic expansion, they engineer an unnatural amino acid into the hub domain of the kinase (residue 384). In the presence of the photoactivatable crosslinker BZF and UV illumination, a ladder of subunits was generated indicating intraholoenzyme crosslinks were established. Using this cross-linked enzyme, presumably incapable of subunit exchange, the authors show significant phosphorylation of the kinase-dead mutant. This further supports that IHP is the cause of phosphorylation and not subunit exchange. Extending these experiments, they could not find evidence when CaMKIIF394BZF was mixed with the kinase-dead mutant and exposed to UV light, that there was evidence of the kinasedead subunits exchanged into CaMKIIF394 (active) enzymes.

      Just a note, instead of residue 384, this should read 394.

      With an entirely different approach, the authors use isotopic labeling of different pools of wt CaMKII (N14 or N15) followed by bifunctional cross-linking and mass spec to assess potential intra- and interholoenzyme contacts. Several interesting findings came of these studies detailed in Figure 4, mapped in detail in Figure 5, and extensively documented in supplementary tables. Critically, numerous crosslinks were found between different domains of the enzyme (catalytic, regulatory, hub) that are themselves a nice database of proximity measurements, but critical to the hypothesis, no heterotypic cross-links were found in the hub domains at any activated state or time point of incubation. This data supports two findings, that catalytic domains come into close proximity between holoenzymes when activated, supporting the potential for IHP, but that no subunit exchange occurs.

      The authors then pursue the approach used originally to provide evidence of subunit mixing, single molecule-based fluorescence imaging. Using pools of CaMKII labeled with spectrally separable dyes, the authors reproduce the earlier findings (Stratton et al, 2016) showing that under activating conditions, but not basal conditions, colocalized spots were detected. Numerous controls were done that confirm the need for full activation (Ca2+/CaM + Mg2+/ATP) to visualize co-localized CaMKII holoenzymes. Extending these studies, the authors mix holoenzymes, fully activate them, and after sufficient time for subunit exchange (if it occurs), the reactions were quenched, and then samples were analyzed. The result was that no evidence of dual-colored holoenzymes was present; if subunits had mixed between holoenzymes, dual-colored spots should have been evident after quenching the reactions. This was not the case. Further, experiments repeated with pools of differentially labeled kinase dead enzymes produced no colocalization, as predicted, if activation of the catalytic domains is necessary to establish IHP.

      Finally, the authors employ mass photometry to investigate the potential for interholoenzyme interactions. At basal conditions, only a mass peak consistent with CaMKII dodecamers was evident. Upon activation, a small fraction of dimeric complexes was evident (with Ca2+/CaM bound) but the majority of the peak was a dodecamer with 12 associated CaM molecules, and importantly, a significant fraction of a mass population was found consistent with a pair of holoenzymes with associated CaM. As an aside, the holoenzyme population appeared to be modestly destabilized as evidence of a minor fraction of dimers appeared as the authors diluted the enzyme, but the pools of holoenzyme and pairs of holoenzymes (with CaM) remained the dominant species when activated under all three enzyme concentrations assessed. Supporting the importance of activation for interactions between holoenzymes, the catalytically dead kinase even under activating conditions, shows no evidence of dimers of holoenzymes.

      Each of the approaches is well-controlled, the data is of uniformly high quality, and the authors' interpretations are generally well-supported.

      We are very grateful for these supportive comments.

      Reviewer #3 (Public Review):

      CaMKII is a multimeric kinase of great biologic interest due to its crucial roles in long-term memory, cardiac pacemaking, and fertilization. CaMKII subunits organize into holoenzymes comprised of 1214 subunits, adopting a donut-like, double-ringed structure. In this manuscript, Lucic et al challenge two models in the CaMKII field, which are somewhat related. The first is a longstanding topic in the field about whether the autophosphorylation of a crucial residue, Thr286, can be phosphorylated between intact holoenzymes (inter-holoenzyme phosphorylation). The second is a more recent biochemical finding, which tested the long-running theory that CaMKII exchanges subunits between holoenzymes to create mixed oligomers. These two models are connected by the idea that subunit exchange could facilitate phosphorylation between subunits of different holoenzymes by allowing subunits to integrate into a different holoenzyme and driving transphosphorylation within the CaMKII ring. Here, the authors attempt to show that one intact holoenzyme phosphorylates another intact holoenzyme at Thr286. The authors also provide evidence suggesting that subunit exchange is not occurring under their conditions, and therefore not driving this phosphorylation event. The authors propose a model where instead of exchanging subunits, two holoenzymes interact via their kinase domains to enable transphosphorylation at Thr286 without integrating into the holoenzyme structure. In order for the authors to successfully convince readers of all three facets of this new model, they need to provide evidence that 1) transphosphorylation at Thr286 happens when subunit exchange is blocked, 2) subunit exchange does not occur under their conditions, and 3) there are interactions between kinases of different holoenzymes that lead to productive autophosphorylation at Thr286.

      Strengths:

      The authors have designed and performed a battery of cleverly designed and orthogonal experiments to test these models. Using mutagenesis, they mixed a kinase-dead mutant with an active kinase to ask whether transphosphorylation occurs. They observe phosphorylation of the kinase-dead variant in this experiment, which indicates that the active kinase must have phosphorylated it. A few key questions arise here: 1) whether this phosphorylation occurred within a single CaMKII holoenzyme ring (which is the canonical mechanism for Thr286 phosphorylation), 2) whether the phosphorylation occurred between two separate holoenzyme rings, and 3) why was this not observed in previous literature? To address questions 1 and 2, the authors implemented an innovative strategy introducing a geneticallyencoded photocrosslinker in the oligomerization domain, which when crosslinked using UV light, should lock the holoenzyme in place. The rate of phosphorylation was the same when comparing uncrosslinked and crosslinked CaMKII variants, indicating that phosphorylation is occurring between holoenzymes, rather than through a subunit exchange mechanism that would require some type of disassembly and reassembly (presumably blocked by crosslinking). The 3rd question remains as to why this has not been previously observed, as it has not been for lack of effort. The authors mention low temperature and low concentration as culprits, however, Bradshaw et al, JBC v. 277, 2002 carry out a series of careful experiments that indicated that autophosphorylation at T286 is not concentration-dependent (meaning that the majority of phosphorylation occurs via intra-holoenzyme), and this is done over a concentration and temperature range. It is possible that due to the mutants used in the current manuscript, it allows for the different behavior of the kinase-dead domains, which will have an empty nucleotide-binding pocket. Further studies will need to elucidate these details, and importantly, understand what physiological conditions facilitate this mechanism.

      We thank the reviewer for their assessment of our work.

      The paper cited by the reviewer (Bradshaw et al, JBC v. 277, 2002) is indeed a carefully designed biochemical investigation of CaMKII activity. As the reviewer pointed out, one of the conclusions of the paper is that the autophosphorylation of CaMKII is not concentration dependent, implying that it has to occur exclusively intra-holoenzyme. However, there are some limitations which colour the interpretation of this classic paper. Bradshaw and colleagues used only CaMKII wild-type protein, so the autophosphorylation which is taking place in their reactions is possible both within holoenzymes and between holoenzymes, but this is impossible to distinguish. The authors of the cited paper then used “Autonomous activity assay” (not any measurement of pT286 on CaMKII itself) in which they first stopped the initial autophosphorylation reaction at T286 by adding a quench solution which contained a mixture of EDTA and EGTA, and then measured phosphorylation of the peptide-substrate of CaMKII (autocamtide-2), in the absence of Calmodulin binding (autonomous activity). They also diluted the autophosphorylation reaction to 10 nM CaMKII before adding it to the “Autonomous activity assay”.

      As a side point, each reaction was quenched and diluted to the same final CaMKII concentration of 10 nM. They measured the activity of this dilution with phosphorylation of a peptide-substrate (autocamptide-2), in the absence of CaM binding. The authors contend that autonomous activity reported in this way reflects the amount of pT286, which is not impossible, but it is not a direct measure of pT286.

      All this adds up to allowing the autophosphorylation of wild-type CaMKII at various concentrations ranging from 0.1 to 4.6 µM in the presence of 10 µM Ca/CaM and 500 µM Mg/ATP. This is a very fast reaction, concentrations of enzyme (CaMKII wild-type), activator (Ca/CaM) and ATP/Mg are all high at the beginning of the autophosphorylation reaction and would expect to allow for maximal autophosphorylation in very short times (seconds). Most importantly, this experiment does not exclude a inter-holoenzyme reaction slower than the intra-holoenzyme one. It certainly could not detect it.

      In any case, to relate these concepts to our experiments and current understanding of CaMKII, we performed a new set of experiments modelled on the Bradshaw paper. Critically, we used CaMKII wild-type as the enzyme, and CaMKII kinase-dead, as the substrate. Intraholoenzyme phosphorylation cannot occur in this reaction, which was designed to detect a concentration-dependent phosphorylation reaction. We used a fixed concentration of the substrate kinase (4 µM), and 4 different concentrations of CaMKIIWT ranging from 0.5 -100 nM. In our assay, the level of phosphorylation on substrate CaMKII(CaMKIIKD) was dependent on concentration of enzyme CaMKII (CaMKIIWT) (Figure 1-figure supplement 3), adding more evidence to the hypothesis that CaMKII autophosphorylation can occur inter-holoenzyme.

      The possibility that empty nucleotide binding pocket is influencing the phosphorylation status of T286 in the regulatory domain of kinase-dead CaMKII is highly unlikely. One could maybe envision that empty nucleotide binding pocket might expose the regulatory domain in kinase-dead CaMKII for phosphorylation, which would be prevented in CaMKIIWT, but in all available structures of CaMKII (Chao et al, 2011; Myers et al., 2017, Buonarati et al., 2021), the regulatory domain is docked to the kinase domain of CaMKII, although the nucleotide binding pocket is empty (either by mutation of residue K42 and/or simply by not adding the ATP/Mg to reduce chemical dispersity of the sample). The only time the regulatory domain was not docked on the kinase domain is when CaMKII was in complex with Calmodulin (Rellos et al., 2010). Finally, in our crosslinking mass spectrometry experiments, we used both heavy and light forms of CaMKII wild-type, and there we can clearly see interactions between kinase/regulatory domains of two different species of CaMKIIWT, which are dependent on activation.

      The most convincing data that subunit exchange does not occur is from the crosslinking mass spectrometry experiment. The authors created mixtures of 'light' and 'heavy' CaMKII holoenzymes, either activated or not and then used a Lys-Lys crosslinker (DSS) to trap the enzyme in its final state. The results of this experiment indicate that subunit exchange is not occurring under their conditions. A caveat here is that there are not many lysines at hub-hub interfaces, which is the crux of this experiment. If there is no subunit exchange under their conditions, how does transphosphorylation occur between holoenzymes? The authors show very nice mass photometry data indicating that there are populations of 24-mers, which corresponds to a double-holoenzyme. Paired with the data from their crosslinking mass spectrometry which shows crosslinks between kinase domains of different holoenzymes, this indicates that perhaps kinases between holoenzymes do interact, and they do so in a competent manner to allow transphosphorylation to occur.

      It is true that there are “only” 6 Lysines in the hub domain of CaMKII. However, it is clear from our crosslinking mass spectrometry data that we can detect hub:hub peptides coming from the same holoenzymes (homocrosslinks, either 14N: 14N or 15N: 15N species), but never between holoenzymes (14N with 15N). The fact that peptides can be detected in the homocrosslinks speaks to the validity of using Lysine crosslinkers in this experiment.

      Weaknesses:

      The authors should be commended for performing three orthogonal experiments to test whether CaMKII holoenzymes exchange subunits to form heterooligomers. However, there are technical issues that dampen the strength of the results shown here. For simplicity, let's consider that CaMKII holoenzymes are comprised of two stacked hexameric rings. It has been proposed that the stable unit of CaMKII assembly and perhaps also disassembly and subunit exchange is a vertical dimer unit (comprised of one subunit from each hexameric ring). In the UV crosslinking data shown in this paper, the authors have a significant number of monomers, some crosslinked dimers (of which there are two populations), and fewer higher-order oligomers. To effectively block subunit exchange, robust crosslinking into hexamers is necessary, which the authors have not done. Incomplete crosslinking results in smaller species that can still exchange (and/or dissociate), confounding the results of this experiment. In addition, Figure 3 shows a trapping experiment, where if the exchange was occurring, there would be an oligomeric band in Lane 8, which is visible and highlighted with a blue arrow by the authors. This result is explained by nonspecific UV effects, however by eye it is not clear if there is an equivalent band in lane 10. The overall issue here is inefficient crosslinking.

      We agree with the reviewer that the robustness of the UV-induced crosslinking is not extremely high. However we do observe higher order oligomers on the gel (Figure 2 and Figure 3B, pT286 blot), which states that at least a portion of the holoenzymes is crosslinked. On the other hand, the UVinduced crosslinking is not slowing down the trans-phosphorylation reaction, which would be expected if the subunit exchange would be the prevailing mechanism for spread of kinase activity between holoenzymes.

      In figure 3, lanes 8 and 10 show a small portion of dimers (less than 5% by densitometry), and at the absolute limit of detection. This dimer band is most likely due to unspecific UV-induced disulfide bridging (we already lessened it by adding 50 mM TCEP prior to UV treatment (Figure 3-figure supplement 1B and C). Previous reviewers of this manuscript criticized the small dimer band in lane 8, and we wanted to address this transparently in the submission to eLife.

      Unfortunately, if we absolutely crank up the contrast to see this band in lane 10, we start to see other features in the noise as well. We have now edited the image in Figure 3B to highlight these minor bands more clearly, but this is also not ideal.

      With regard to the trapping experiment, the overall problem is not inefficient crosslinking, because we see that P-T286 signal is quite nicely represented in higher order bands from F394BzF protein, but kinase dead protein (Avi-tagged signal in Figure 3) is almost entirely absent. Any crosslinking of Avitagged protein (possibly corresponding to subunit exchange) is a minor process at the limit of detection on WB.

      Unfortunately we did not yet find any better crosslinking sites than the two we report (we have tried about 10). But the results we did obtain encouraged us to employ other techniques to probe subunit exchange (for example, the MS X-linking).

      The authors also employ a single-molecule TIRF experiment to further interrogate subunit exchange. Upon inspection of the TIRF images, it is not clear that the authors are achieving single molecule resolution (there are evident overlapping and distorted particles). The analysis employed here is Pearson's correlation coefficient, which is not sufficient for single molecule analysis and would not account for particle overlap, particles that are too bright, and/or particles that are too dim. For example, an alternative explanation for the authors' results is that activation results in aggregation (high correlation), and subsequent EGTA treatment leads to dissociation at these low concentrations (low correlation). However, further experimentation and analysis are necessary.

      In the manuscript we present raw images, not processed. As we wrote in the material and methods, we thresholded the images for further processing. All colocalization methods have drawbacks, but we found that our thresholding combined with the Pearson coefficient was highly reproducible. We did also look at Manders coefficients, but these are less straightforward to understand, whilst still giving in our hands the same answer. We agree, there are more experiments that can be done, with particular predictions based on our new mechanism. And we are doing them and will report them when they are ready.

      At the risk of repeating ourselves, the reversible loss of overlap of the two labelled populations is the key result and cannot be explained by spurious dim or bright particles, or by a few overlapping profiles.

      Taken together, the authors have provided important food for thought regarding inter-holoenzyme phosphorylation and subunit exchange. However, given the shortcomings discussed here, it remains unclear exactly what mechanisms are at play within and between CaMKII holoenzymes once activated.

      We thank the reviewer for their critical assessment of our manuscript. We will continue to investigate the relevant points and refine the overall picture of CaMKII, to better clarify the mechanisms.

    1. Author Response

      Reviewer #1 (Public Review):

      Sučević and Schapiro investigated a neurobiologically inspired model of human hippocampal structure and computation in category learning. In three separate simulations, the model (CHORSE) is presented with learning tasks defined by various category structures from prior work and evaluated for its ability to learn the category structure, generalize categorization to novel stimuli, and accurately recognize previously encountered stimuli. Although originally conceived of as a computational model of associative memory, C-HORSE is demonstrated to quite naturally account for human-like learning of the three category tasks. Notably, the authors characterize the mechanisms underlying the model's learning by way of additional simulations in which "lesions" to the model's monosynaptic pathway (MSP; direct connections between ERC and CA1) are contrasted with lesions to its trisynaptic pathway (TSP; pathway connecting ERCDG-CA3-CA1). These in silico lesions offer key insight into the computational principles underlying theorized hippocampal functions in category learning: whereas MSP provides incremental learning of shared features diagnostic to category membership that are important for category generalization, TSP learns item-specific information that drives recognition behaviour. The authors propose that C-HORSE's successful account of a broad set of category learning datasets provides clear support for the role of complementary hippocampal functions mediated by MSP and TSP in category learning. This work adds compelling computational evidence to a growing literature linking hippocampus to a broader role in cognition that extends beyond declarative memory.

      The model simulations are clear and properly conducted. The three datasets examined offer a relatively broad set of findings from the category learning literature; that the models provide reasonable accounts of human performance in all three speaks to the model's generalizability. Overall, I find this work exciting and an important step in linking longstanding well-established formal learning theories of psychology with neurobiological mechanism. Several weaknesses dampen this excitement, each of which are detailed below:

      1) C-HORSE is presented as a new entry into a rich field of formal computational models of category learning. As noted above, the datasets examined span a broad range of learning contexts and structures and the model's ability to account for learning behaviour is compelling. However, no other models are leveraged to perform a direct evaluation. In other words, CHORSE's predictions are compelling, but is it better than other competing models in the literature? To be clear, C-HORSE offers a novel alternative with its fundamental mechanisms originating from anatomical structure and connectivity. As such, a proof-of-concept showing that such a neurobiologically inspired framework can account for category learning behaviour is a worthwhile contribution in its own right and a clear strength of this paper. However, how to consider this model relative to existing theoretical frameworks is not well described in the manuscript.

      We very much appreciate this point — see response to Editor summary point #3 above.

      2) Relatedly, C-HORSE is evaluated in terms of qualitative fit to behaviour measures from prior studies and in all three simulations restricted to measure of end of learning performance. Again, an appeal to the proof-of-concept nature of the current work may provide an appropriate context for this paper. But, a hallmark of well-established category learning models (e.g., SUSTAIN, DIVA, EBRW, SEA, etc.) is their ability to account for both end of learning generalization (and in some cases, recognition) and behaviour throughout the learning process. C-HORSE does provide predictions of how learning unfolds over time, but how well this compares to human measures is not considered in the current manuscript. Such comparisons would strengthen the support for C-HORSE as a viable model of category learning and help position it in the busy field of related formal models.

      We completely agree about the value of this, and we have added empirical timecourse data for comparison with all simulations, as described in response to Editor summary point #7, above.

      3) A consistent finding across all three simulations is that the TSP provides item-specific encoding. Evidence for this can be inferred by contrasting categorization and recognition performance across the TSP- and MSP-only model variants. In the discussion, the authors draw a parallel between exemplar theories of category learning and the TSP, which is a compelling theoretical position. However, as noted by the authors, unlike exemplar theories, the TSP-only model was notably impaired at categorization. The author's suggestions for extensions to CHORSE that would enable better TSP-based categorization are interesting. But, I think it would be helpful to understand something about the nature of the representations being formed in the TSP-only model. For example, are they truly item-specific, are the shared category features simply lost to heightened encoding of item-unique features, are category members organized similarly to the intact model just with more variability, and so on. Characterizing the nature of these representations to understand the limitations of the TSP-only model seems important to understanding the representational dynamics of C-HORSE, but are not included in the current manuscript.

      The RSA results, now included for Simulations 2 and 3 in addition to Simulation 1, provide the information needed to characterize the nature of the TSP representations. Generally speaking, they are truly item specific, meaning that each item is represented by its own distinct set of units. This is a demonstration of the classic pattern separation function of this pathway, taking similar inputs and projecting them to orthogonal populations of neurons. Simulation 1 is the clearest example of this, where there is virtually no similarity and very low variability in the item similarity structure in DG and CA3. The new Simulation 3 RSA shows us where the limit is to this pattern separation ability of the TSP, with highly typical items being represented by somewhat overlapping populations of neurons in DG and CA3. To the extent that the TSP can succeed in generalization, it seems to involve this pattern separation failure.

      We have made these points more explicit in new discussion of the RSA results:

      • Simulation 1: “In the initial response, there was no sensitivity at all to category structure in DG and CA3 — items were represented with distinct sets of units. This is a demonstration of the classic pattern separation function of the TSP, applied to this domain of category learning, where it is able to take overlapping inputs and project them to separate populations of units in DG and CA3.” • Simulation 3: “As in the prior simulations, DG and CA3 represented the items more distinctly than CA1, and settled activity after big-loop recurrence increased similarity, especially in CA1. This simulation was unique, however, in that DG and CA3 showed clear similarity structure for the prototype and highly prototypical items. There is a limit to the pattern separation abilities of the TSP, and these highly similar items exceeded that limit. This explains why, at high typicality levels, the TSP could be quite successful on its own in generalization (Figure 5e), and why it struggled with atypical feature recognition for these items (Figure 5f).”

      4) In general, a detailed description that links model mechanisms and analyses to the learning constructs of interest for the different simulations is lacking. For example, RSA results for simulation 1 are contrasted for initial and settled representations, but what is meaningful about these two timepoints is not directly stated (moreover, what initial and settled response mean in terms of the current model is not explained). The authors do briefly suggest that differences between initial and settled representations may reflect encoding dynamics before and after bigloop recurrence, but this is not established as a key metric for evaluating the nature of the model representations. In general, more motivation is needed to understand what the chosen analyses reveal about the nature of the model's learning process and representations.

      We have added more description of the motivation for our analyses. See response to Editor summary point #6 above.

      5) I appreciate the comparison in the discussion to extant models of categorization. Certainly, the exemplar and prototype models are fixtures of the category learning literature and they somewhat align with the type of learning that TSP and MSP, respectively, provide. REMERGE and SUSTAIN are also briefly mentioned, but their discussion is limited which is unfortunate as they are actually more functionally equivalent to C-HORSE. I think, however, that the authors are missing an opportunity to discuss how C-HORSE offers a means for bridging levels of analysis to connect neurobiological mechanisms with these notably successful psychological models of category learning. Rather than framing C-HORSE as a competitor to existing models, it should be viewed as an account existing on a different level of analysis. In this sense, it complements existing approaches and potentially extends a theoretical olive branch between the psychology and neuroscience of category learning.

      We love this point about bridging levels of analysis and have added it to our discussion of the model’s relationship to other models, see Editor summary point #3 above.

      6) The discussion takes a broad perspective on covering evidence concerning hippocampal contributions to category learning. Although comprehensive, some sections are not well connected back to the main thrust of the paper. For example, a section on neuropsychological accounts of the hippocampus and category learning summarizes central aspects of this literature but is never reflected on through the lens of the current findings. I do think this prior work is relevant, especially since it a central theme of the hippocampus not being necessary for category/concept learning, but its connection back to the current study is not well argued. Similarly, the section on consolidation and sleep is relevant, but in its current form does not seem to fit with the rest of the paper.

      We have implemented these suggestions through very significant revisions to the Discussion. We now better connect the sections to the main argument of the paper and made cuts throughout, including removing the section on consolidation and sleep.

      Reviewer #2 (Public Review):

      The authors present a model of the hippocampal region that incorporates both the (indirect) trisynaptic and (direct) mono-synaptic pathways from entorhinal cortex (EC) to CA1 - the former incorporating projections from EC to dentate gyrus (DG), DG to CA3, and CA3 to CA1, and exhibiting a higher learning rate. They demonstrate that exposing this network to stimuli consistent with standard empirical tests of category learning (e.g. where within-category exemplars share a set of common features) allows the network to reliably assign both novel and previously encountered stimuli to the correct category (e.g. the network can learn to classify stimuli and generalise this knowledge to new examples). They show that the tri-synaptic pathway (TSP) preferentially supports the encoding of individual exemplars (e.g. analogous to episodic memory) while the mono-synaptic pathway (MSP) preferentially supports category learning.

      The manuscript is well written, the simulation details appear sound, and the results are clearly and accurately presented. This model builds on a long tradition of computational modelling of hippocampal contributions to human memory function, strongly grounded in anatomical and electrophysiology data from both rodents and humans, and is therefore able to link phenomena at the level of individual cells and circuits to emergent behaviour - a major strength of this, and similar, work. However, I have two major concerns relating to the relationship between these findings and previously published work by the same and other authors.

      First, it is not clear to me - from the manuscript - whether these results represent a significant novel advance on previous publications from the same senior author. Figures 1 and 3D are almost identical to figures published in Schapiro et al. (2017) Phil Trans B, and the take-home message (that the MSP might support statistical learning) is the same. In brief, it seems that the authors have subjected an identical network to some new (but related) tasks and reached the same set of conclusions. I see no distinction between learning to extract 'statistical regularities' (in previous work) and learning 'the structure of new categories' (described here). As an aside, demonstrating that an autoencoder network can learn stimulus categories and generalise to new exemplars is also well established.

      We appreciate the opportunity to better articulate the novelty and importance of applying the model to the domain of category learning. There are crucial differences between statistical learning and category learning that make these simulations nontrivial (it did not have to be the case that the results would replicate for these category learning paradigms), and, importantly, many of the insights in the current work are category-learning specific (e.g., the effects of atypical features, trade-offs between generalization and recognition of exemplar-specific features). On the other hand, we of course agree that there are principles in common between statistical learning and category learning that are leading to the consistent findings. We added new material to the Introduction to explain the importance of these new simulations in the domain of category learning, and the value we see in demonstrating convergence across domains. See response to Editor point #1 above.

      Second, I have some concerns with the relationship between the properties of this hippocampal network model and well described properties of single cells in the rodent and human hippocampus. In particular, the CA1 units in this model (and to some extent, also the CA3 units) come to respond strongly to all exemplars from within each category (e.g. as shown in Figure 3D, bottom right panel). This appears to be at odds with the known properties of place and concept cells from the rodent and human hippocampus, respectively, which show little generalisation across related concepts (i.e. the Jennifer Aniston neuron does not fire in response to other actors from Friends, for example). If the emergent properties of this model are not consistent with existing data, then it is not a valid model.

      We appreciate the opportunity to discuss connections to the physiology literature. See response to Editor summary point #2 above.

      More generally, the authors are clear that this model is "a microcosm of [the] hippocampusneocortex relationship" and that the properties of the MSP "mirror those of neocortex". Why not assume that category learning is supported by an interaction between hippocampus and neocortex, then, as in the complementary learning systems (CLS) model? Aside from some correlational fMRI data and partial deficits in hippocampal amnesics - either of which could have a myriad of different explanations - what empirical data is better accounted for by this model than CLS? Put differently, what grounds are there for rejecting the CLS model? To some extent, this model appears to account for less empirical data than CLS, with the exception of a few recent neuroimaging studies (which are hard to interpret at the level of single cells)

      This is an important point for us to clarify, so we very much appreciate this comment. The crucial issue with CLS that motivated the microcosm theory is that the neocortex in the CLS framework learns far too slowly to support the kind of category learning studied in these paradigms, which unfolds over the course of minutes or hours. The neocortex in CLS was proposed to learn novel structure across days, months, and years.

      We have added the following to the Introduction:

      • “Despite its analogous properties, the MSP is not redundant with neocortex in this framework: the MSP allows rapid structure learning, on the timescale of minutes to hours, whereas the neocortex learns more slowly, across days, months, and years. The learning rate in the MSP is intermediate between the TSP (which operates as rapidly as one shot) and neocortex. The proposal is thus that the MSP is crucial to the extent that structure must be learned rapidly.”

      We also have this description in the Discussion:

      • “The MSP in our model has properties similar to the neocortex in that framework, with relatively more overlapping representations and a relatively slower learning rate, allowing it to behave as a miniature semantic memory system. The TSP and MSP in our model are thus a microcosm of the broader Complementary Learning Systems dynamic, with the MSP playing the role of a rapid learner of novel semantics, relative to the slower learning of neocortex.”

      Reviewer #3 (Public Review):

      The current work aimed to determine how the hippocampus may be able to detect regularities across experiences and how such a mechanism may serve to support category learning and generalization. Rapid learning in the hippocampus is critical for episodic memory and encoding of individual episodes. However, the rapid binding of arbitrary associations and one-shot learning was long thought suboptimal for finding regularities across experiences to support generalization, which were instead ascribed to other, slower-learning memory systems. More recent work has started to highlight hippocampal role in generalization, renewing the question of how generalization can be accomplished alongside memory for episodic details within a single memory structure. The current paper offers a reconciliation, presenting a biologically-inspired model of the hippocampus that is able to learn categories alongside stimulus-specific information comparably to human performance. The results convincingly demonstrate how distinct pathways within the hippocampus may differentially serve these complementary memory functions, enabling the single structure to support both episodic memory and categorization.

      Major strengths and contributions

      The paper includes simulation of three distinct categorization tasks, with a clear explanation of the unique aspects of each task. The key results are consistent across tasks, lending further support to the main conclusions of the role of distinct hippocampal pathways in learning specific details vs. regularities. Together with prior work on how the same architecture can support statistical learning in other types of tasks, this work provides important evidence of the broad role of the hippocampus in rapid integration of related information to serve many forms of cognition.

      Throughout the paper, the authors nicely explain in conceptual terms how the same underlying computations may serve all three categorization tasks as well as statistical learning and episodic inference tasks. Thus, the paper will be of broad interest, beyond researchers focused on modeling and/or categorization.

      On a conceptual level, this work provides a fruitful framework for understanding hippocampal functions, representations and computations. It provides a highly plausible mechanistic explanation of how category learning and generalization can be accomplished in the hippocampus and how distinct types of representations may emerge in distinct hippocampal subfields. The framework can be used to derive new testable predictions, some of which the authors themselves introduced. It also provides new insights into how the outputs of different pathways influence each other, providing a more nuanced view of the division of labor and interactions between hippocampal subfields. For example, the big loop recurrence would eventually lead to category influences even on the initially sparse, pattern separated representations in the CA3, which is an idea consistent with empirical observations.

      The presented computational model of the hippocampus is currently the most detailed and biologically plausible hippocampal model easily applicable in the area of cognitive neuroscience and behavioral simulations. The commonalities and differences with other related models (conceptual and computational) are well explained. Both the conceptual and technical descriptions of the model are exceptionally clear and detailed. The model is also publicly available for download for any researcher to use with their own task and data. All these aspects make it likely that other researchers may adopt the model in a wider range of tasks, stimulating new discoveries.

      The autoencoder nature of the model and the use of categorization tasks meant that some measures of interest, like recognition of exemplar-specific information, could not be evaluated by direct reading of the output layer to compare with some label (like old/new). The authors however came up with clever ways how to evaluate recognition performance in each task that was sensible and highlighted the multiple ways how one may think about information contained in neural representations in each layer. This approach can also be utilized by others for evaluating item-specific and category information in activation patterns, for example in analyses of fMRI.

      Finally, I thought the current paper and provided model may also serve as an excellent introduction to computational modeling for those new to this approach. The exceptional clarity of the conceptual and technical description of this model and the clear logic of how one may model a cognitive task and interpret results made this paper fairly accessible. Furthermore, the paper offered new insights and predictions based on analyzing the model's hidden layers, lesion performance, and/or noting some patterns of behavior unique to specific tasks. This was also instructive for highlighting the distinctive contributions that the computational modeling approach can have for furthering our understanding of cognition and the brain.

      We are extremely appreciative of the value the Reviewer sees in this work.

      Weaknesses

      The paper's strengths far outnumbered the weaknesses, that are minor. For one, the selected categorization tasks nicely complemented each other, but only covered stimuli with discretevalue dimensions (features like color, shape, symbol, etc). The degree to which the results generalize (or not) to continuous-value stimuli and different category structures (for instance information-integration or rule-based in COVIS framework) is not clear. How the model could be adjusted for continuous-value stimuli was not specified.

      We agree that the simulation of only discrete valued dimensions is a limitation. We chose to do this simply because it is easier to use discrete values in the model as currently implemented, but future work will certainly need to test whether the model can simulate the various paradigms that make use of continuous-valued dimensions. We have added an explicit acknowledgement of this issue in the Methods:

      • “The inhibition simulates the action of inhibitory interneurons and is implemented using a set-point inhibitory current with k-winner-take-all dynamics (O’Reilly, Munakata, Frank, Hazy, & Contributors, 2014). All simulations involved tasks with discrete-valued dimensions, as these are more easily amenable to implementation across input/output units whose activity tends to become binarized as a result of these inhibition dynamics. It will be important for future work to extend to implementations of category learning tasks with continuous-valued dimensions.”

      There is compelling evidence for the dissociation between different hippocampal pathways and subfields (CA1 vs. CA3) that the model is based on. As the authors noted, there is also compelling evidence for functional dissociations along the long hippocampal axis, with anterior portions more geared towards coarse, generalized representations while posterior towards more detailed, specific representations. The authors nicely pointed out that these proposals of withinhippocampus division of labor are less orthogonal than they may first appear, as there is greater proportion of CA1 in the anterior hippocampus. However, it is premature to imply that this resolves the CA1/CA3 vs. anterior/posterior question; the idea that existing anterior findings may be simply CA1 findings is currently only speculation. Furthermore, first studies indicating that anterior/posterior representational gradients may exist within each subfield are beginning to emerge.

      We completely agree that this is speculative at this point, which needed acknowledgment. See response to Editor summary point #2 above.

    1. Under Our Team can Maternity Leave be replaced with just "on leave" -- i don't think it's necessary to share why they are on leave. May as well add Jazz Cook as well. Also, should we not include our pronouns here?

    1. ourses and programs within the “academic” curriculum emphasize subject-matter knowledge and the development of broadly applicable skills—think history, science, language studies, etc.

      Trades are academic programs. In Hairstyling alone we learn history - how has the trade evolved? Which cultures developed certain styles and why? Science - Formulating colours is chemistry, Mixing disinfectants is math and science! Technical Terminology is language. My trade may not identify as one single category but it includes several dimensions of learning. It offers students the opportunity to indulge in a variety of aspects and perhaps thats why it has become increasingly interesting to students who possess multiple intelligences.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:<br /> In this study, the authors generate a Drosophila model to assess disease-linked allelic variants in the UBA5 gene. In humans, variants in UBA5 have been associated with DEE44, characterized by developmental delay, seizures, and encephalopathy. Here, the authors set out to characterize the relationship between 12 disease-linked variants in UBA5 using a variety of assays in their Drosophila Uba5 model. They first show that human UBA5 can substitute all essential functions of the Drosophila Uba5 ortholog, and then assess phenotypes in flies expressing the various disease variants. Using these assays, the authors classify the alleles into mild, intermediate, and severe loss-of-function alleles. Further, the authors establish several important in vitro assays to determine the impacts of the disease alleles on Uba5 stability and function. Together, they find a relatively close correlation between in vivo and in vitro relationships between Uba5 alleles and establish a new Drosophila model to probe the etiology of Uba5-related disorders.

      Strengths:<br /> Overall, this is a convincing and well-executed study. There is clearly a need to assess disease-associated allelic variants to better understand human disorders, particularly for rare diseases, and this humanized fly model of Uba5 is a powerful system to rapidly evaluate variants and relationships to various phenotypes. The manuscript is well written, and the experiments are appropriately controlled.

      Reviewer #2 (Public Review):

      Relative simplicity and genetic accessibility of the fly brain make it a premier model system for studying the function of genes linked to various diseases in humans. Here, Pan et al. show that human UBA5, whose mutations cause developmental and epileptic encephalopathy, can functionally replace the fly homolog Uba5. The authors then systematically express in flies the different versions of the gene carrying clinically relevant SNPs and perform extensive phenotypic characterization such as survival rate, developmental timing, lifespan, locomotor and seizure activity, as well as in vitro biochemical characterization (stability, ATP binding, UFM-1 activation) of the corresponding recombinant proteins. The biochemical effects are well predicted by (or at least consistent with) the location of affected amino acids in the previously described Uba5 protein structure. Most strikingly, the severity of biochemical defects appears to closely track the severity of phenotypic defects observed in vivo in flies. While the paper does not provide many novel insights into the function of Uba5, it convincingly establishes the fly nervous system as a powerful model for future mechanistic studies.

      One potential limitation is the design of the expression system in this work. Even though the authors state that "human cDNA is expressed under the control of the endogenous Uba5 enhancer and promoter", it is in fact the Gal4 gene that is expressed from the endogenous locus, meaning that the cDNA expression level would inevitably be amplified in comparison. The fact that different effects were observed when some experiments were performed at different temperatures (18 vs. 25) is also consistent with this. While I do not think this caveat weakens the conclusions of this paper, it may impact the interpretation of future experiments that use these tools, and thus should be clearly discussed in the paper. Especially considering the authors argue that most disease variants of UBA5 are partial loss-of-functions, the amplification effect could potentially mask the phenotypes of milder hypomorphic alleles. If the authors could also show that the T2A-Gal4 expression pattern in the brain matches well with that of endogenous RNA or protein (e.g. using HCR-FISH or antibody), it would help to alleviate this concern.

      We thank the reviewer for pointing out this limitation.

      Regarding the humanization strategy we used in the study, we agree that this is a binary system which may lead to overexpression of the target protein. However, as the

      reviewer also points out, this temperature-sensitive system also enables us to flexibly adjust the expression level of the target protein, which is especially useful to study

      partial LoF variants such as the UBA5 variants in this study. In our study we have successfully compared the relevant allelic strength of most of the variants, which

      supports the use of our system in future studies. However, we do agree that the gene dosage effect could vary widely, so it is difficult to directly predict the effects of one variant in humans based upon results obtained in a model organism.

      We agree with the reviewer that a masking effect may exist in our system due to its gene overexpression nature. However, we cannot conclude that this masking effect

      really affects the interpretation of Group IA variants in our tests. The three variants are mild LoF, which is also supported by the biochemical assays. Hence, the variants may not cause any phenotype even when they are expressed at a physiological level.

      Regarding the temporal and spatial expression pattern of the T2A-GAL4, the Bellen lab has generated T2A-GAL4 lines for more than 3,000 genes. The expression pattern of the vast majority of these GAL4 lines faithfully reflects the expression pattern of the endogenous genes, which has been documented in our previous publications (PMIDs 25824290, 29565247, 31674908, 35723254).

      Reviewer #3 (Public Review):

      Summary:<br /> Variants in the UBA5 gene are associated with rare developmental and epileptic encephalopathy, DEE44. This research developed a system to assess in vivo and in vitro genotype-phenotype relationships between UBA5 allele series by humanized UBA5 fly models and biochemical activity assays. This study provides a basis for evaluating current and future individuals afflicted with this rare disease.

      Strengths:<br /> The authors developed a method to measure the enzymatic reaction activity of UBA5 mutants over time by applying the UbiReal method, which can monitor each reaction step of ubiquitination in real time using fluorescence polarization. They also classified fruit fly carrying humanized UBA5 variants into groups based on phenotype. They found a correlation between biochemical UBA5 activity and phenotype severity.

      Weaknesses:<br /> In the case of human DEE44, compound heterozygotes with both loss-of-function and hypomorphic forms (e.g., p.Ala371Thr, p.Asp389Gly, p.Asp389Tyr) may cause disease states. The presented models have failed to evaluate such cases.

      We agree with the reviewer that our model did not reflect the situation of the individuals who are compound heterozygous for a Group IA variant (p.Ala371Thr, p.Asp389Gly, or p.Asp389Tyr) and a strong LoF variant. However, we argue that our results do show that the Group IA variants alone do not cause disease. As discussed in the manuscript, individuals homozygous for the p.Ala371Thr variant are healthy and do not present with obvious phenotype. This is consistent with our findings in flies, and shows that the p.Ala371Thr variant is a mild LoF variant.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the thoughtful suggestions made by the Reviewers. We have addressed all of their comments below, with our responses bulleted and in italics. We believe these changes have helped clarify the manuscript and strengthen it overall.

      Reviewer 1

      1) Figures 1B and Supp. Figure 1A: It would be worth mentioning that the wave-form in the 129 strain in response to QLA starts out like AJ and B6, but transitions to looking like the wild-derived strain. So, although not quite as drastic as the NZO and NOD strains, it is not quite like the other classical inbred strains.

      • We thank the reviewer for pointing this out. We have added further language to clarify the point:

      “Additionally, even with the clear separation between the clusters, inter-strain variation was still observed within the clusters (e.g. more 129 islets had plateau responses to 8G/QLA than the B6 or AJ).”

      2) The figures are generally excellent and really help to clarify the work in the paper. For Figure 2A, it would help even further if you could number the six different Ca++ parameters that are measured. They're all there, but it takes a bit of time to find them on the figure and numbering will make it easier on your reader.

      • We appreciate this suggestion and have implemented it in our revised Figure 2A. The Ca2+ parameters are now numbered, and the description of this figure has been adjusted accordingly in the results section.

      We added the revised text in the results section:

      “To elucidate strain differences in Ca2+ dynamics, we focused on six parameters of the Ca2+ waveform (Figure 2A): 1) peak Ca2+ (the top of each oscillation); 2) period (the length of time between two peaks); 3) active duration (the length of time for each Ca2+ oscillation measured at half of the peak height, also known the oxidative “secretory” phase, or “MitoOx” (8); 4) pulse duration (active duration plus extra time for Ca2+ extrusion); 5) silent duration (the electrically-silent “triggering” phase, also known as “MitoCat” (8), which culminates in KATP closure and membrane depolarization); and 6) plateau fraction (the active duration divided by the period, or the fraction of time spent in the active “secretory” phase).”

      3) Figure 4A, B: I was expecting to see Ca++ vs insulin parameters in the different strains/sexes. In addition to the heat maps, it would be useful to see the regression plots, showing where each strain and sex falls for the insulin and Ca++ parameters.

      • This is an excellent suggestion, and we have added a new Supplemental Figure 5 to provide examples of various strain/sex patterns that drive the correlations used for the heatmap and histogram in Figure 4A and B.

      We added text in the results section referring to this point:

      “Clustering the Ca2+ responses into distinct groups based on our observations of the waveforms (Figure 1B, Figure 4C-E, and Supplemental Figures 1 and 2) also occurs when correlating individual Ca2+ parameters to ex vivo secretion and clinical data (Supplemental Figure 5). For example, the anticorrelation between the 1st frequency component in 8G and percent insulin secreted in 8.3G/QLA (Supplemental Figure 5A) separates the classic inbred, wild-derived, and diabetes-susceptible strains into distinct groups despite the variability in the trait. Correlation between the silent duration in 8G/QLA to insulin secretion in 8.3G/QLA, likewise groups by strain (Supplemental Figure 5B). Finally, some correlations, such as that between 8G/QLA/GIP silent duration and plasma insulin at sacrifice (Supplemental Figure 5C), can be strongly influenced by outlier strains; e.g., NZO. Collectively, these data demonstrate that genetics has a profound influence on key parameters of islet Ca2+ oscillations.”

      4) Please include methods for the insulin measurements collected in Fig. 4.

      • Thank you for pointing out this missing information. We have clarified that prior insulin measurements (plasma insulin and ex vivo static insulin secretion that were used in Figure 4 for correlation analysis) were completed in another previously published cohort of mice (reference 17: Mitok KA, Freiberger EC, Schueler KL, Rabaglia ME, Stapleton DS, Kwiecien NW, et al. Islet proteomics reveals genetic variation in dopamine production resulting in altered insulin secretion. The Journal of biological chemistry. 2018;293(16):5860-77).

      We added this new text (highlighted) to the results section to help clarify this point:

      “Fasting blood glucose and insulin levels were measured in mice at 19 weeks of age, except for the NZO males which were measured at 12 weeks of age. Glucose was analyzed by the glucose oxidase method using a commercially available kit (TR15221, Thermo Fisher Scientific), and insulin was measured by radioimmunoassay (RIA; SRI13K, Millipore). This is the same assay that was used to measure plasma insulin for the previously published cohort used for the correlation analysis in Figure 4 (17).”

      5) In the methods, please include details on the four conditions used for Ca++ imaging of the islets, and the timing for each condition.

      • We appreciate this guidance in clarifying our manuscript, and we have now included the conditions and timing for each condition in the methods section.

      We added the following text to the results section to help clarify this:

      “The solutions included 8 mM glucose (8G), 8 mM glucose + 2 mM glutamine, 0.5 mM leucine, and 1.5 mM alanine (8G/QLA), 8G/QLA + 10 nM glucose-dependent insulinotropic polypeptide (8G/QLA/GIP), and 2 mM glucose (2G), each of which were kept in a 37°C water bath.”

      Reviewer 2

      One major critique is that the authors studied "the human orthologues of the correlated mouse proteins that are proximal to the glycemia-associated SNPs in human GWAS". This implies two assumptions - (1) human and mouse proteins do not differ in terms of islet physiology and calcium signaling; (2) the proteins proximal to the SNPs are the causal factors for functional differences, though the SNPs could affect protein/gene function distant from the SNPs.

      • Thank you very much for highlighting this limitation in our study. We think this is very important to address which we have done in our discussion section.

      We have added the following text to discuss this important issue:

      “Our approach to merge human GWAS with our findings in mouse assumes that the glycemic-related SNPs we nominated alter the abundance or function of the human orthologues. Most SNPs that are strongly associated with phenotypes in human GWAS are noncoding, residing within introns, promoters, 3’UTRs, or intergenic regions (e.g. Figure 6). Therefore, a limitation of our approach is the assumption that SNPs regulate the gene they are proximal to, which is not always accurate (76-78). To infer a more direct link between SNPs and potential target genes, we incorporated human islet chromatin data (37). Physical contact between a region containing SNPs and a distal gene supports a regulatory role, as for ACP1 (Figure 6B). Additionally, SNPs within regions of open chromatin (ATAC-seq) and actively transcribed regions (histone markers) suggest a higher likelihood of regulating transcription factor access. While this approach does not conclusively show a link between the SNPs and expression of the orthologue for our candidate proteins, these chromatin data more strongly suggest that the orthologue expression may be regulated by the candidates’ SNPs.”

    1. Reviewer #3 (Public Review):

      The authors report a study in which they use intracranial recordings to dissociate subjectively aware and subjectively unaware stimuli, focusing mainly on prefrontal cortex. Although this paper reports some interesting findings (the videos are very nice and informative!) the interpretation of the data is unfortunately problematic for several reasons. I will detail my main comments below. If the authors address these comments well, I believe the paper may provide an interesting contribution to further specifying the neural mechanisms important for conscious access (in line with Gaillard et al., Plos Biology 2009).

      The main problem with the interpretation of the data is that the authors have NOT used a so-called "no-report paradigm". The idea of no report paradigms is that subjects passively view a certain stimulus without the instruction to "do something with it", e.g., detect the stimulus, immediately or later in time. Because of the confusion of this term, specifically being related to the "act of reporting", some have argued we should use the term no-cognition paradigm instead (Block, TiCS, 2019, see also Pitts et al., Phil Trans B 2018). The crucial aspect is that, in these types of paradigms, the critical stimulus should be task-irrelevant and thus not be associated with any task (immediately or later). Because in this experiment subjects were instructed to detect the gratings when cued 600 ms later in time, the stimuli are task relevant, they have to be reported about later and therefore trigger all kinds of (known and potentially unknown) cognitive processes at the moment the stimuli are detected in real-time (so stimulus-locked). You could argue that the setup of this delayed response task excludes some very specific report related processes (e.g., the preparation of an eye-movement), which is good, however this is usually not considered the main issue. For example when comparing masked versus unmasked stimuli (Gaillard et al., 2009 Plos Biology), these conditions usually also both contain responses but these response related processes are "averaged out" in the specific contrasts (unmasked > masked). In this paper, RT differences between conditions (that are present in this dataset) are taken care of by using this delayed response in this paper, which is a nice feature for that and is not the case for the above example set-up.

      Given the task instructions, and this being merely a delayed-response task, it is to be expected that prefrontal cortex shows stronger activity for subjectively aware versus subjectively unaware stimuli. Unfortunately, given the nature of this task, the novelty of the findings is severely reduced. The authors cannot claim that prefrontal cortex is associated with "visual awareness", or what people have called phenomenal consciousness (this is the goal of using no-cognition paradigms). The only conclusion that can be drawn is that prefrontal cortex activity is associated with accessing sensory input: and hence conscious access. This less novel observation has been shown many times before and there is also little disagreement about this issue between different theories of consciousness (e.g., global workspace theory and local recurrency theories both agree on this).

      The best solution at this point seems to rewrite the paper entirely in light of this. My advice would be to state in the introduction that the authors investigate conscious access using iEEG and then not refer too much to no-cognition paradigm or maybe highlight some different strategies about using task-irrelevant stimuli (see Canales-Johnson et al., Plos Biology 2023; Hesse et al., eLife 2020; Hatamimajoumerd et al Curr Bio 2022; Alilovic et al., Plos Biology 2023; Pitts et al., Frontiers 2014; Dwarakanth et al., Neuron 2023 and more). Obviously, the authors should then also not claim that their results solve debates about theories regarding visual awareness (in the "no-cognition" sense, or phenomenal consciousness), for example in relation to the debate about the "front or the back of the brain", because the data do not inform that discussion. Basically, the authors can just discuss their results in detail (related to timing, frequency, synchronization etc) and relate the different signatures that they have observed to conscious access.

      I think the authors have to discuss the Gaillard et al PLOS Biology 2009 paper in much more detail. Gaillard et al also report a study related to conscious access contrasting unmasked and masked stimuli using iEEG. In this paper they also report ERP, time frequency and phase synchronization results (and even Granger causality). Because of the similarities in approach, I think it would be important to directly compare the results presented in that paper with results presented here and highlight the commonalities and discrepancies in the Discussion.

      In the Gaillard paper they report a figure plotting the percentage of significant frontal electrodes across time (figure 4A) in which it can be seen that significant electrodes emerge after approximately 250 ms in PFC as well. It would be great if the authors could make a similar figure to compare results. In the current paper there are much more frontal electrode contacts than in the Gaillard paper, so that is interesting in itself.

      In my opinion, some of the most interesting results are not highlighted: the findings that subjectively unaware stimuli show increased activations in the prefrontal cortex as compared to stimulus absent trials (e.g., Figure 4D). Previous work has shown PFC activations to masked stimuli (e.g., van Gaal et al., J Neuroscience 2008, 2010; Lau and Passigngham J Neurosci 2007) as well as PFC activations to subjectively unaware stimuli (e.g., King, Pescetelli, and Dehaene, Neuron 2016) and this is a very nice illustration of that with methods having more detailed spatial precision. Although potentially interesting, I wonder about the objective detection performance of the stimuli in this task. So please report objective detection performance for the patients and the healthy subjects, using signal detection theoretic d'. This gives the reader an idea of how good subjects were in detecting the presence/absence of the gratings. Likely, this reveals far above chance detection performance and in that case I would interpret these findings as "PFC activation to stimuli indicated as subjectively unaware" and not unconscious stimuli. See Stein et al., Plos Biology 2021 for a direct comparison of subjectively and objectively unaware stimuli.

      In Figure 7 of the paper the authors want to make the case that the contrast does not differ between subjectively aware stimuli and subjectively unaware stimuli. However so far they've done the majority of their analyses across subjects, and for this analysis the authors only performed within-subject tests, which is not a fair comparison imo. Because several P values are very close to significance I anticipate that a test across subjects will clearly show that the contrast level of the subjectively aware stimuli is higher than of the subjectively unaware stimuli, at the group level. A solution to this would be to subselect trials from one condition (NA) to match the contrast of the other condition (NU), and thereby create two conditions that are matched in contrast levels of the stimuli included. Then do all the analyses on the matched conditions.

      Related, Figure 7B is confusing and the results are puzzling. Why is there such a strong below chance decoding on the diagonal? (also even before stimulus onset) Please clarify the goal and approach of this analysis and also discuss/explain better what they mean.

      I was somewhat surprised by several statements in the paper and it felt that the authors may not be aware of several intricacies in the field of consciousness. For example a statement like the following "Consciousness, as a high-level cognitive function of the brain, should have some similar effects as other cognitive functions on behavior (for example, saccadic reaction time). With this question in mind, we carefully searched the literature about the relationship between consciousness and behavior; surprisingly, we failed to find any relevant literature." This is rather problematic for at least two reasons. First, not everyone would agree that consciousness is a high-level cognitive function and second there are many papers arguing for a certain relationship between consciousness and behavior (Dehaene and Naccache, 2001 Cognition; van Gaal et al., 2012, Frontiers in Neuroscience; Block 1995, BBS; Lamme, Frontiers in Psychology, 2020; Seth, 2008 and many more). Further, the explanation for the reaction time differences in this specific case is likely related to the fact that subjects' confidence in that decision is much higher in the aware trials than in the unaware trials, hence the speeded response for the first. This is a phenomenon that is often observed if one explores the "confidence literature". Although the authors have not measured confidence I would not make too much out of this RT difference.

      I would be interested in a lateralized analysis, in which the authors compare the PFC responses and connectivity profiles using PLV as a factor of stimulus location (thus comparing electrodes contralateral to the presented stimulus and electrodes ipsilateral to the presented stimulus). If possible this may give interesting insights in the mechanism of global ignition (global broadcasting), supposing that for contralateral electrodes information does not have to cross from one hemisphere to another, whereas for ipsilateral electrodes that is the case (which may take time). Gaillard et al refer to this issue as well in their paper, and this issue is sometimes discussed regarding to Global workspace theory. This would add novelty to the findings of the paper in my opinion.

    1. We’ve chosen to keep highlights private to avoid pages being cluttered by highlights that have no surrounding discussion. We understand that people may want to share highlights with others, and we think there are effective ways we can address that in the future.

      You would imagine that by now you would be able to share some of your highlights without having to add some weird annotation to it especially when you are trying to share it on a private group.

      it is also quite worrisome that the last time there was a comment about this it was in 2019. it is almost like there's no work or effort put into this one.

      so as much as there's a comment about "thinking of effective ways", there's no clear indication that there's some were going into it.

    1. This is the main concern raised by the public, a risk of large-scale or even unprecedented impacts on public health or the biosphere. This is one example of many: I am extremely concerned that this proposed action could potentially contaminate native life forms on Mars and/or bring back alien virus, bacteria, or other life forms from Mars to Earth. I understand that there are planetary protection protocols. However, Murphy's Law says that if something horrible could happen, it eventually may indeed occur. History is filled with examples where Acts of God and/or human arrogance caused otherwise unforeseen disasters. .... The Earth is already dealing with increasingly serious problems from invasive or alien species being transported to new locations, and viruses mutating and causing deadly pandemics. We have not been able to solve many of these problems. What happens if a Mars life form escapes containment and, without evolving in Earth's ecosystems, spreads uncontrollably and devastates Earth's species including us humans? There might be no way to reverse or even mitigate for that devastation. I support scientific research when it is safe and in the public interest. However, I oppose research when there is no absolute guarantee of safety and when the risks outweigh the potential benefits. (Spotts, 2022) I provide direct links to all the comments submitted in the final round of public comments with a brief summary of the level of concern for each one here: Most public comments share Sagan's priority that NASA can't take a risk of large-scale harm NASA's response to Spotts was: "Refer to the previous response for HS-002" (NASA, 2023 : B-5) HS-002 is their answer to another similar question: Granger:Are you certain that in any way, this mission won’t end with the total annihilation of the entire planet, or force us to live in biomes for the rest of time? NASA: As discussed in Section 3.2 of the PEIS, the exact nature of the Mars sample constituents regarding biosignatures and potential biological activity is currently unknown. The PEIS cites several sources supporting the position that contamination of Earth by Martian microorganisms is extremely unlikely to pose a risk of significant harmful effects. However, the risk cannot be demonstrated to be zero (see Response ID HS-001 for information regarding containment measures). As a result, a comprehensive quantitative analysis of the potential impacts of a sample release in the event of an off-nominal landing and the effects of Mars samples on Earth’s environment cannot be accomplished with current data; any such analysis would be theoretical at best, involving substantial speculation and supposition. For this reason, the emphasis of the MSR approach is on sample containment (NASA, 2023 : B-43) So even in response to a concern by a member of the public who asked NASA if it is possible that one consequence would be that we have to live in biomes on Earth for the rest of time or total annihilation of the planet (presumably meaning extinction of all terrestrial life) NASA were not able to rule this out as a possible consequence of their mission. Instead NASA responds by saying that the emphasis is on sample containment, since they can't predict consequences if the samples are not contained. As we saw at the start expert opinion is that the risk of such scenarios is very low, and the analogy of a house fire and a smoke detector fits them well. But we take great care to protect our houses from the very low risk scenario of a house fire. Smoke detector analogy for the low risk of large-scale harm to human health and Earth's biosphere Later in this paper we look at a couple of examples of a likely very low risk but of unprecedented harm. The mirror life scenario in worst case where we can't engineer microbes to stop it could be incompatible with our ecosystems and take over the soils, and then we'd need to maintain the terrestrial ecosystems in biomes and keep out mirror life. It wouldn't happen instantly but as it radiates and spreads through the ecosystems we'd then need to work to rescue them and the only solution might be large dome-like biomes covering them and barriers in the soil and then measures within to sterilize them of mirror life and to keep it out. Detailed scenarios of mirror life and a novel fungal genus to motivate biosafety planning This doesn't fit their conclusion in the PEIS itself that any environmental effects would not be significant. A non zero risk of large-scale harm to Earth's biosphere that could lead to humans having to live in biomes for the rest of time is NOT identical to NASA's conclusion in the PEIS of no risk of global harm. Chester Everline, the expert on probabilistic risk assurance who commented on the last day of public comments put it like this: Given our lack of scientific insight into possible life on Mars, relics of life we may return from Mars, or simply organic substances from Mars that could interact with certain life forms on Earth, how can we possibly assert with confidence that MSR poses an acceptable risk to Earth's biosphere, even if the incredibly difficult target of a 99.9999% target for successful containment is satisfied? Given that sample return missions of the type proposed for MSR have never been attempted before, is it even feasible to do enough testing to assure that a 99.9999% target can be achieved? (NASA, 2023 : B38) NASA's response: Please see Response IDs HS-001 and HS-002 regarding risks to Earth’s biosphere and NASA’s approach to addressing that. With regards to the assurance case (HS- 017), no outcome in science and engineering processes can be predicted with 100% certainty. NASA’s extensive testing activities serve to support the assurance case (NASA, 2023 : B38) NASA's statement there "no outcome in science and engineering processes can be predicted with 100% certainty" is not valid. It is frequently the case that we can predict outcomes in science and in engineering with 100% certainty. In this case, for instance, we can predict with 100% certainty that if NASA doesn't return these samples, there is no risk to Earth;s biosphere or inhabitants from the samples that Perseverance is currently caching on Mars. We can also achieve the very high level of "no appreciable risk" or essentially 100% safety by sterilizing all samples returned to Earth with a sufficiently high level of ionizing radiation. We are not required to take ANY risks with Earth's biosphere. Whether to take such a risk is an ethical decision and not a decision that can be mandated by scientists or engineers. Chester Everline continues: Does NASA intend to impose a threshold for acceptable risk (i.e., a value above which the mission is considered too risky to proceed)? A possible consequence of unsuccessful containment is an ecological catastrophe. Although such an occurrence is unlikely, NASA should at least be clear regarding what level of risk it is willing to assume (for the biosphere of the entire planet)

      I think there is no mention of experts already having problems with the way NASA are dealing with this. If there is ignore this comment. But I feel that a mention of this should be high above and then saying to see down here for more info on this.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript describes that simultaneous inhibition of LOXL2 and BRD4 reduces proliferation of TNBC in vitro and reduces growth in vivo.

      This observation is followed by extensive mechanistic studies that suggest physical interaction between LOXL2 and short isoform of BRD4-MED1. Inferences from Chip-seq analyses suggest that this interaction is involved in regulation of multiple transcriptional programs. Authors focus on differential activation of DREAM complex, to claim that this interaction "is fundamental for proliferation of TNBC". The manuscript is very well written and mechanistic inferences are based on a set of sophisticated epigenetic analyses and bioinformatical inferences. The phenotypic effects from LoxL2 inhibition by itself, or in combination with BRD4 inhibition are relatively modest. These modest effects, as well as many of the reported changes in gene expression are clearly inconsistent with the frequently used adjectives as "dramatic", "fundamental", "deeply affected", "drastically hampered" etc. Given the modest phenotypic effects, many of the key claims and conclusions are not supported by the data.

      We thank the reviewer for appreciating our work, defining the manuscript as well-written, and saying that it comprises extensive mechanistic studies as well as sophisticated epigenetic analysis.

      We apologize if some of our statements seemed exaggerated. In this revised version, we revisited some of our conclusion to moderate them.

      Moreover, we took the reviewer's criticism as an opportunity to strengthen our findings. In the revised version of the manuscript, we included an additional TNBC PDX (PDX-127), and results from this experiment clearly reinforce our claims (Fig. 6D and Fig. EV9E-F). In this new in vivo experiment, we selected a PDX model in which the expression of BRD4L is not detectable, while BRD4S is clearly expressed. Therefore, the treatment with JQ1 would specifically affect the activity of BRD4S, making the treatment selective. Additionally, we reduced by half the dose of JQ1 administrated to limit the effect of BRD4S inhibition alone on tumor growth. The combinatorial treatment (JQ1+PXS) induced a clear superior effect in this setting as compared with single-agent treatments. In addition to this, we discarded that the observed growth reduction is not the result of the sole inhibition of LOXL2, which could affect FAK/Src activity or extracellular Collagen crosslinking. In conclusion, our data show that the combinatorial inhibition of LOXL2 and BRD4S is effective in reducing tumor proliferation in TNBC in vivo models, independently of the inhibition of BRD4S and of other pathways known to be regulated by LOXL2.

      Specifically:

      1) It is unclear why authors generalize their conclusions to TNBC. Figure 1B demonstrates synergy for 1/3 cell lines, which is chosen for the follow up study. Even for MDA231, the synergy is confined to low concentrations of BRD4i (S1c). While MDA231 cell line is frequently used in experimental studies of TNBC, it is quite dissimilar to majority of clinical TNBC, and contains mutant RAS, which is rare in this disease.

      The synergistic effect is observed in MDA-MB-231 cells because only this cell line expresses both BRD4S and LOXL2. Indeed, in Fig. 1C we show that MDA-MB-468 cells do not express LOXL2, while BT549 only express minimal BRD4 levels.

      To corroborate this hypothesis, in the revised version of the manuscript we added:

      1. A new cell line (Cal51) expressing the same LOXL2 and BRD4 levels (Fig. EV8C) but showing greater resistance to JQ1 than MDA-MB-231 (Fig. EV8D). Also, in this cell line, we could show that the combinatorial treatment had a superior effect on cell viability than the single agents’ treatment (Fig. EV8E).
      2. A western blot panel of different TNBC PDXs shows that the majority of them express medium to high levels of both BRD4S and LOXL2 proteins, as is the case of MDA-MB-231 (Fig. EV9E) and Cal51 (Fig. EV8C). This result suggests that the combinatorial treatment could be used in the majority of TNBC patients as they are expected to express both BRD4S and LOXL2.
      3. Finally, as explained above, we performed another in vivo choosing a PDX that expresses BRD4S (but not BRD4L) and LOXL2 (PDX-127) (Fig. 6D and Fig. EV9E-F). Also, in this new model, we could observe that the combinatorial inhibition had a superior effect than single treatments.

        2) In vivo, the effect appears to be modest even in the MDA231 model, selected for evidence of synergy in vitro. In vivo, the combination appears to have an additive effect. Tumor growth rates are reduced, but no shrinkage is occurring. In the PDX model, LOXL2i does not have an effect as a monotherapy, while modestly enhancing the impact of BRD4i. These results are at odds with the claim of the interaction being fundamental for proliferation.

      We agree with the reviewer that the combinatorial inhibition appears to have an additive effect in vivo using the MDA-MB-231 model.

      1. For that reason, we have now performed the in vivo PDX experiment mentioned above (PDX-127; Fig. 6D and Fig. EV9E-F) in which we decreased the dose of JQ1 by half to avoid strong tumor growth effect due to BRD4 inhibition alone. In this new experiment, the synergistic effect is evident. While single-agent treatment showed a very moderate effect (0% or 20% tumor growth reduction for LOXL2 and JQ1, respectively), the combinatorial treatment showed a 50% reduction in tumor volume, further supporting our conclusions.
      2. We also performed either BRD4 or MED1 pull-down experiments in the presence of PXS and JQ1. We show that upon PXS treatment, the interaction between LOXL2 and BRD4S is maintained while the interaction with MED1 is reduced (Fig. 5A-C). However, in the presence of JQ1, the interaction between LOXL2 and MED1 is maintained while BRD4S-LOXL2 and BRD4S-MED1 interactions are impaired (Fig. 5D-F). These new results explain why monotherapy does not have a sufficient effect in vivo and set the rationale for the use of the combinatorial treatment. We believe that these new results corroborate our initial findings and we hope to have been able to satisfy the reviewer comments.

      3) No analysis of cell proliferation was shown in vivo. Authors should have performed BrdU or KI67 staining to support the claim. For in vitro analyses, authors also used indirect assays for proliferation. PI staining by itself does not have sufficient resolution to clearly capture modest effects that authors demonstrate. BrdU-PI double staining would have been much more useful.

      We appreciate the reviewer’s comment. In the revised manuscript we have added Ki67 and H3S10p staining in the tumor samples for the new in vivo PDX experiment (Fig. 6E and Fig. EV10A-C). We show that the combinatorial treatment significantly induces a reduction of both proliferation markers, which is in agreement with a reduced tumor volume. Regarding the in vitro analysis, we did not only use PI staining to show a reduced proliferation state but also H3S10p staining (Fig. 4B) and an SLBP1 fluorescent reporter MDA-MB-231 cell line (Fig. 4D, Fig. EV6B, E, and Movie EV). In the revised version of the manuscript, we included a new FACS-PI analysis (Fig. 4A, C) to better represent the effects we see on the cell cycle.

      Minor points:

      Dose dependent decrease in phosphorylated H3 is not at all obvious from eyeballing the data in S1A; the only effect that I see is a modest reduction at the highest concentration of the inhibitor. Authors need to quantify the results to support the claim.

      We agree with the reviewer and we apologize for the misinterpretation. We have changed the revised manuscript as follows: “The selective LOXL2 inhibitor PXS-538224 (hereafter, PXS) efficiently reduced the levels of oxidized histone H3 (H3K4ox) in MDA-MB-231 cells at 40 μM (Fig. EV6C), indicating an efficient inhibition of LOXL2 catalytic activity in the nucleus.”

      Most of breast cancer cell lines are derived from metastatic disease, including pleural effusion, thus the point that because MDA231 cell line is derived from pleural effusion, it is metastatic does not have sufficient logical foundation.

      Many publications have shown the high metastatic capacity of MDA-MB-231 (e.g. https://doi.org/10.1016/j.bbabio.2011.04.015, doi: 10.1038/s41467-017-01829-1), which are therefore used as TNBC metastatic model. The scope of the analysis reported in Fig. 6C was just to show whether any of the used treatments could reduce the metastatic capacity of this cell line. We believe we do not overstate the results but just report them as they are.

      How is loss of cell-cell junction in vitro consistent with LOXL2 role in modulating ECM? There is no evidence of ECM production in MDA231 in vitro. On the other hand, this loss is associated with EMT.

      We thank the reviewer for identifying this mistake. In the revised manuscript we changed the text as follows: “Gene set enrichment analysis (GSEA) revealed that LOXL2 KD induced upregulation of processes involved in cell morphology, secretion, membrane trafficking, and cell differentiation, with cell-cell junction being one of the most significantly affected pathways (Fig. EV5E). These results agree with the role of LOXL2 in regulating epithelial-to-mesenchymal transition, corroborating the high quality of our dataset.”

      Reviewer #1 (Significance (Required)):

      Discovery and characterization of LOXL2-BRD4 interaction is advancing the ever-deepening understanding of molecular mechanisms of regulation of gene expression. The studies and analyses appear to be sufficiently rigorous and reported with clarity, and the claimed discovery of the biological interaction between LOXL2 and BRD4 is well supported. However, given the magnitude of the reported (rather than claimed) effects of this interaction, and concerns about generalizability of authors conclusions, it is not clear how these results are promising for the development of new therapies in TNBC. Moreover, in contrast to luminal BC, there is no clear evidence for utility of cytostatic drugs in constraining TNBC. Therefore, biological and clinical significance of the authors discovery is unclear and claims in this regard appear to be overblown

      We thank the reviewer for stating that our analysis is rigorous and reported with clarity. We really took the criticisms as an opportunity to strengthen our findings, as explained above.

      For the newly presented in vivo PDX model, we performed immunohistochemistry of Ki67, H3S10p and Cleaved Caspase 3 to check whether the reduction of tumor volume observed in the combinatorial treatment was a result of a cytotoxic and/or a cytostatic effect (Fig. 6E and Fig. EV10A-C). As shown in the figure, the combination of the two inhibitors induced a superior decrease of Ki67, H3S10p, and a clear increase of Cleaved Caspase 3. Therefore, these new data indicate that the combinatorial treatment does not only have a cytostatic effect but also cytotoxic, suggesting a clinical exploitability for the treatment of TNBC patients.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their study, Pascual-Reguant et al. show that combined inhibition of BRD4 and LOXL2 can synergize to restrict triple-negative breast cancer (TNBC) proliferation. BRD4 and LOXL2 are transcription regulators that can read and write epigenetic information, respectively. The authors employ three distinct breast cancer cell lines and mouse models with cell line-derived xenografts, and they show that combined inhibition of BRD4 and LOXL2 can be superior to single BRD4/LOXL2 inhibition in these model systems. In an attempt to identify a connection between BRD4 and LOXL2, the authors find that the two proteins can bind to each other. The authors performed most of the experiments in the breast cancer cell line MDA-MB-231. To assess the impact of LOXL2-inhibition on transcription, the authors assessed changes of the transcriptome in MDA-MB-231 cells following LOXL2 knockdown. They found that genes related to cell differentiation and morphology were upregulated, while genes related to the cell cycle were downregulated. ChIP-seq data of BRD4 showed that BRD4 can bind to cell cycle gene promoters and that this binding was enhanced upon loss of LOXL2. The authors found that LOXL2 and BRD4 interacted with the transcriptional cell cycle regulators B-MYB, FOXM1, and LIN9, which are components of the MYB-MuvB-FOXM1 (MMB-FOXM1) complex that is known to promote the expression of late cell cycle genes with important functions during mitosis. The authors conclude that LOXL2/BRD4 interact with each other and with the MMB-FOXM1 complex to drive the expression of cell cycle genes and cell proliferations. Vice versa, they conclude that inhibition of LOXL2/BRD4 reduces cell proliferation through inhibiting the expression of cell cycle genes.

      Major:

      • The data and methods are presented well. The experiments are adequately replicated and analyzed. However, except for the first section, all experiments were performed using only one cell line. It is important to validate key findings in at least a second cell line.

      We thank the reviewer for valuing our work.

      To address the reviewer’s comment, in the revised manuscript we added an additional cell line (Cal-51), that expresses similar levels of LOXL2 and BRD4 as compared to MDA-MB-231 (Fig. EV8C). Even though this cell line is clearly more resistant to JQ1 than the MDA-MB-231 cell line (Fig. EV8D), the combinatorial treatment is significantly more effective as compared with single agents’ treatment (Fig. EV8E).

      Moreover, we have also performed an additional in vivo experiment using another TNBC PDX (PDX-127) that expresses LOXL2 and BRD4S, but not BRD4L. Given that JQ1 can inhibit both BRD4 isoforms, this in vivo system allowed us to demonstrate that the tumor antiproliferative capacity of the combinatorial treatment is due to the simultaneous inhibition of LOXL2 and BRD4S (rather than BRD4S and L) (Fig. 6D and Fig. EV9E-F).

      • There appears to be a misunderstanding of the concept of cell cycle-dependent gene regulation by the DREAM complex and its related factors. Early (G1/S) cell cycle genes contain E2F promoter motifs, while late (G2/M) cell cycle genes contain CHR promoter motifs. The DREAM complex can bind both, while RB-E2F and MuvB recognize only E2F and CHR motifs, respectively. B-MYB and FOXM1 bind to MuvB and regulate late cell cycle genes, but they do not bind to early cell cycle genes. Given this concept, the authors' rationale to connect BRD4/LOXL2 through MuvB/B-MYB/FOXM1 with E2F promoter sequences and early cell cycle genes and the subsequent conclusions must be corrected.

      We thank the reviewer for their expert explanation. We corrected our conclusion in the revised version of the manuscript following the reviewer’s comment.

      • I felt that the suggested functional connection between LOXL2/BRD4 and DREAM is not strongly supported by the authors' data. Figure S6E: A similarity score of Fig. EV6E: We agree with the reviewer that a similarity score of Fig. 4E: We thank the reviewer for this comment. The performed pulldown showed that BRD4S, LOXL2, and MED1 interact with Lin9 and B-Myb, but not with FOXM1, thus FOXM1 itself is an internal negative control of the pulldown. Additionally, BRD4L does not show the same interaction pattern as BRD4S, LOXL2, and MED1, again acting as an internal negative control. We, therefore, believe that the pulldown is properly controlled and that the observed interaction is trustful. We furthermore agree with the reviewer that it would be interesting to characterize the interactions between the DREAM complex and BRD4S, LOXL2, and MED1. However, we believe that the dissection of these interactions at the mechanistic levels would require a deeper study, which can be a project in itself that we aim to explore in the future. For example, it would be interesting to investigate whether either the inhibition or the downregulation of LOXL2 and/or BRD4S specifically impairs the formation of the DREAM complex or the recruitment of specific DREAM complex subunits, as well as how these effects impair the DREAM complex chromatin binding. We are afraid that the suggested pulldowns would not be sufficient to answer these questions, which would require extensive cross-interaction studies in either BRD4/LOXL2 and BRD4+LOXL2 inhibition or downregulation followed by ChIP-seq and transcriptomics for all the conditions. We believe that the provided data, together with the functional characterization (both, in vitro and in vivo), of the phenotypes triggered by BRD4S and LOXL2 inhibition make a strong case for our manuscript and leave out of scope the suggested experiments. We hope the reviewer will understand our explanation and will appreciate that we are planning to pursue this further in the future.

      Fig. 3: We thank the reviewer for this important comment. The ChIP-seq technique very often does not provide exhaustive results due to sequencing depth limits and antibody performance. We believe that the fraction of DREAM target genes found in our dataset as bound by BRD4S is not exhaustive and that the analysis proposed by the reviewer would not lead to clear conclusive results. However, we understand the importance of verifying that DREAM target genes whose promoter is bound by BRD4 are indeed downregulated when LOXL2 is inhibited. To give an answer to this question, in the revised manuscript we added gene expression analysis of selected DREAM target genes upon treatment with JQ1, PXS their combination. We could successfully show that both JQ1 and PXS treatment impairs the transcription of the selected DREAM target genes, however, the combinatorial treatment almost shut down their expression, in agreement with our hypothesis (Fig. 5J).

      • The authors state that it is surprising to find that LOXL2 can promote target gene transcription because it is rather known as a transcriptional repressor. To this point, the authors should perform standard analyses using their RNA-seq and ChIP-seq data. Compare differential expression of genes that are bound by BRD4S/L/S+L and genes not bound by BRD4. Perform motif search and enrichment analyses for transcription factor and co-factor binding data (public ChIP-seq repositories). Such analyses may suggest what gene sets are up- and downregulated by LOXL2 through BRD4S/L and what other factors could be involved in LOXL2-dependent up- and downregulation of gene transcription.

      We thank the reviewer for this valuable comment that certainly provides the rationale for a follow-up project. However, we believe that the proposed study goes beyond the scope of our work at this moment.

      Minor:

      • I felt that background information on the BRD4 isoforms was missing. The short and long isoforms of BRD4 should be introduced briefly.

      We agree with the reviewer. In the revised manuscript, we addressed this by presenting BRD4 isoforms in the introduction part of the manuscript.

      • Given that BRD4 inhibition is known to activate p53 (e.g., PMID 23317504 and 33431824) and p21 (PMID 31265875), the authors should discuss the p53 status of their cell lines (largely mutant). In general, I felt that the authors could better cite and discuss the current literature on BRD4 and LOXL2.

      We appreciate the comment of the reviewer regarding p53. Given the fact that p53 is mutant in MDA-MB-231, we believe that the proliferation defect observed with the combinatorial treatment may be due to the activation of alternative cytostatic or cytotoxic signaling cascades, independently of P53 activation. We have now briefly mentioned this point in the manuscript discussion.

      • It was unclear to me why the authors did not actually test experimentally whether their predicted interaction models 2 or 4 are likely true (Figure 2E+G).

      We understand the reviewer’s comment. The fact that JQ1 treatment almost abrogates the interaction between LOXL2 and BRD4S strongly suggests that models 1 and 3 are likely wrong, therefore pointing towards models 2 and 4 as the correct ones. To test whether models 2 and 4 are indeed the correct models we are now performing extensive mutagenesis studies, which are producing preliminary results suggesting indeed that models 2 and 4 are correct. The reason why we did not include this study in the current manuscript, is that we started a parallel line of investigation aimed at identifying residues fundamental for the interaction that can be exploited in compound screening campaigns to identify molecules able to block the described interaction and thus cancer proliferation. Publishing these preliminary results at this stage could jeopardize the drug discovery campaign and we hope that the reviewer will understand our constraints.

      • The transcription of cell cycle genes depends on the cell cycle (i.e., reduced cell cycle entry correlates with reduced cell cycle gene expression). Given that the authors showed LOXL2 inhibition reduce MDA-MB-231 cell proliferation, they should note that reduced expression of cell cycle-related genes is expected upon LOXL2 knockdown.

      We understand the reviewer’s comment. We believe that we provide sufficient data supporting our hypothesis that LOXL2 controls the expression of cell cycle genes at the transcriptional level together with BRD4S. In addition, the sole inhibition of LOXL2 has practically no effect on tumor proliferation in vivo but largely enhances the antiproliferative effect of low-dose JQ1 (Fig. 6D). We hope these clarifications would satisfy the reviewer.

      • The authors specify in their discussion that their data show a function of LOXL2/BRD4 in the cell cycle interphase, while there were no experiments that support that specific conclusion. At least it is unclear to me why the authors rule out a function in mitosis?

      We thank the reviewer for this comment. We referred to interphase genes because these are the early cell cycle genes, while mitotic genes are the late ones. We do not discard a possible function for BRD4S and LOX2 regulating mitotic progression, however, we believe this would be a consequence of dysregulated G1-S-G2 gene expression, rather than a direct transcriptional effect. This conclusion derives from the fact that while we observe interactions between LOXL2, BRD4S, and MED1 with Lin9 and B-Myb, these are not fully conserved with FOXM1, which is typically required for the transcription of mitotic genes. To avoid confusion, we have now anyway removed the word “interphase” from the text.

      • I felt that the first part of the manuscript (combination of BRD4 and LOXL2 inhibitors in TNBC) was a bit uncoupled from the functional studies on LOXL2 and its connection to BRD4. The transition between these parts and the final discussion on why the joint control of cell cycle genes by LOXL2/BRD4 may be important for the synergistic effect of LOXL2/BRD4 inhibitors. To this point, the authors' model was not clear to me.

      We really appreciate the reviewer’s comment. To better connect the functional studies with the clinical significance of the proposed combinatorial treatment, we restructured the manuscript. In the revised version, the use of the combinatorial treatment is shown in Figure 6. Moreover, to better explain why we focused all the studies on BRD4 and LOXL2, we also included data from the Cancer Cell Line Encyclopedia (CCLE)-associated chemotherapeutics sensitivity (Fig. 1A and Fig. EV1) showing that LOXL2 expression levels can predict the response to BRD4 inhibition, suggesting a functional interaction between BRD4 and LOXL2 and the possibility to exploit it for therapeutical purposes. We believe that these data set the rationale to further explore the connection between LOXL2 and BRD4, both at the mechanistic and functional levels.

      Reviewer #2 (Significance (Required)):

      The study by Pascual-Reguant et al. shows that inhibitors of BRD4 and LOXL2 can be combined to achieve better efficacy in reducing proliferation of breast cancer cell lines and breast tumor growth in xenograft models. They provide strong evidence for a functional interaction between LOXL2 and BRD4 and investigate their common transcriptional targets. Intriguingly, some evidence points towards a direct regulation of the DREAM complex and its cell cycle gene targets.

      The findings are novel and can be the basis for further research on TNBC combination therapy using BRD4 and LOXL2 inhibitors. The link to the DREAM complex is preliminary.

      The study is of interest for a basic research audience with some translational aspects.

      I reviewed this manuscript as a researcher in gene regulatory mechanisms, with cell cycle genes as one focus area. I have no expertise in the computational modeling of protein-protein interactions and I am no expert for breast cancer.

      We thank the reviewer for the positive comments. We also would really like to thank the reviewer for their criticism, which, we believe, contributed to a new and improved manuscript version.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript, Laura Pascual-Reguant et al. identified a novel role of the LOXL2 oxidase in sustaining cell cycle progression through a so far uncharacterized gene-activating function is mediated by the BRD4S epigenetic reader and exerted on key DREAM-target genes in TNBC. Moreover, the authors showed that combinatorial treatment of TNBC with LOXL2- and BRD4-specific inhibitors result in a tremendous anti-tumorigenic effect. For all findings, they leveraged in vitro and in vivo settings as well as high-throughput sequencing approaches. However, the following points should be addressed and explained.

      Major points:

      -The authors on their working hypothesis propose that dual inhibition of BRD4 and LOXL2 is a novel strategy for curing TNBC. For my taste, just because both targets are quite promising for TNBC, the jump to this combinatorial treatment is kind of abrupt. Knowing the difficulty and time-/financial- investment, authors could optionally perform a mass spectrometry analysis on nuclei lysates with LOXL2 pull down to identify physical interactors. Due to the augmented resources and analysis of raw data, authors may necessitate a generous revision period (approx. 4 months for starters). By that, this can provide a more unbiased approached to look at nucleus-specific gene-regulatory functions and particularly at epigenetic readers. It would be also interesting to see if LOXL2 interacts with other members of the BRD family. Selecting BRD4 and no other members of the bromodomain family cannot be the only choice given that other BRD members can also interact with several of these mediator subunits.

      We thank the reviewer for the suggestion and we agree with the fact that the rationale for combining BRD4 and LOXL2 inhibitors was not sufficiently argued in the first version of the manuscript. For that reason, in the revised manuscript, we added new data to explain why we explored this topic. In particular, to better explain why we focused all the studies on BRD4 and LOXL2, we included data from the Cancer Cell Line Encyclopedia (CCLE)-associated chemotherapeutics sensitivity (Fig. 1A and Fig. EV1) showing that LOXL2 expression levels can predict the response to BRD4 inhibition (but not to other approved chemotherapeutic drug), suggesting a functional interaction between BRD4 and LOXL2 and the possibility to exploit it for therapeutical purposes. Moreover, we restructured the manuscript to make the story more linear, explaining first the functionality of BRD4S-LOXL2 interaction at the molecular and cellular levels, and then presenting the in vivo systems in the last part of the manuscript.

      We agree with the reviewer that it may be interesting to explore whether LOXL2 interacts with other BRD family members. However, given the prominent role of BRD4 in promoting cancer proliferation, we believe that understanding the relevance of BRD4S-LOXL2 interaction in TNBC is, per se, of great interest and provide a novel mechanistic understanding of how TNBC proliferation is controlled at the transcription level. In the specific case of TNBC, it has been shown that BRD4S has an oncogenic effect, while BRD4L is an oncosuppressor. In the manuscript, we now showed that LOXL2 downregulation sensitizes cells to JQ1 treatment (Fig. 1D). Additionally, while the downregulation of BRD4L does not have any additional effect on cell treated with PXS, the downregulation of BRD4S sensitize them to LOXL2 inhibition (Fig. EV8B). These results, once again, indicate the relevance of studying the functional interaction between BRD4S and LOXL2.

      -LOXL enzymes have been shown to promote collagen and fibronectin assembly, thereby sustaining the pro-survival effect of the ITG5A/FN1/FAK/SRC signaling cascade and shielding TNBC cells against chemotherapy treatment (32415208). Did authors observe if LOXL2 loss or inhibition decreased the active status of FAK and SRC, which are well known to promote G1-S transition (25381661)?

      Probably the cell cycle defects upon LOXL2 loss may also partially arise from the impairment of this cascade.

      We really appreciate the reviewer’s suggestions. In the revised version of the manuscript, we checked FAK and Src activation status in tumor samples from one of our in vivo experiments (Fig. EV10D). We did not observe any difference in phospho-FAK or phospho-Src upon treatment either with PXS, JQ1, or their combinations, suggesting that alterations in the activity of these factors were not driving the observed proliferation defects.

      -Authors exclusively use JQ1 as a BRD4 inhibitor. As JQ1 may have an unspecific effect on BRD2 as well, authors should consider reproducing key experiments with siControl- and siBRD4-treated cells and increasing doses of PSX as well as repeating the JQ1 dose response assay in Figure 1B using siRNA-mediated silencing of LOXL2. Given that both players are part of the same complex, silencing of one and inhibition of the other should sensitize cells compared to their control counterparts.

      We agree with the reviewer and we addressed this comment in the revised manuscript. In particular, we have added two additional experiments:

      • We transduced MDA-MB-231 cells with isoform-specific shBRD4s (shBRD4L and shBRD4S) (Fig. EV5H) and checked cell sensitivity to PXS treatment (Fig. EV8B). As explained also above, we observed that only when the short isoform of BRD4 was downregulated cells displayed higher sensitivity to PXS treatment. This result corroborates that BRD4S and LOXL2 are required for TNBC proliferation.

      • We transduced MDA-MB-231 cells with shLOXL2 and assessed JQ1 sensitivity (Fig. 1D). We showed that upon LOXL2 downregulation, cells became more sensitive to JQ1 treatment, again corroborating the fact that TNBC proliferation requires BRD4S and LOXL2.

      -Moreover, in Figures 1G and S3D the differential sensitivity of low and high LOXL2 cell lines is unclear. Do authors know if any of these growth kinetic lines represent one of the tested cell lines in Figure 1A-B? Authors should provide respective legends. In addition, authors should take advantage of their homemade data given that they have already selected a panel of TNBC cell lines with various LOXL2 expression at basal state (Figure 1A) for which dose response assays have been performed (Figure 1B). Therefore, I would perform an IC50 graph for JQ1 (without PSX treatment) using the existing data from Figure 1B.

      We apologize if our representation was confusing. In the revised manuscript we have changed the sensitivity plots (Fig. 1A and Fig. EV1) to make them easier to grasp. Additionally, in Figure 1A we included the analysis of CCLE cell lines stratified based on their LOXL2 expression levels. This analysis showed that LOXL2 expression levels could overall predict the response to BETi treatment. As suggested by the reviewer, we also plotted the IC50 of the 3 cell lines tested. However, their JQ1 sensitivity curves did not show any difference that could be attributed to their different LOXL2 levels. Our speculation is that only 3 cell lines do not provide a sufficient size to reach a meaningful conclusion, which, in contrast, can be achieved by comparing the CCLE BETi sensitivity.

      -In Figure 2D, the pull-down assay is inconclusive, as the molecular weight for each construct is not mentioned. I would probably add this information also in all performed western blots. Also, the overexpression of the BD1/BD2-mutated and especially the BD1/BD2-lacking construct is unclear if it still interacts with LOXL2, probably because of the lack of molecular weight reference of each band. Therefore, the authors should make this pull-down assay more descriptive regarding the size of the bands. Also, BD1 mutagenesis at N140 was shown to dislodge the binding of JQ1 to BRD4 (24497639), which implies that BD1 mutagenesis or overexpression of the BD1-deficient construct should abrogate the interaction of LOXL2 with BRD4, reminiscent to the abrogated interaction of BRD4/LOXL2 upon JQ1 that binds to both BDs (Figure 2F). And, what happens if a BD2-deficient construct is expressed?

      We thank the reviewer for spotting this distraction. We apologize for this and in the revised version of the manuscript we included molecular weights for all western blots.

      We acknowledge that BD1 mutagenesis displaces JQ1 binding, however, we respectfully disagree that because of this BD1-N140 mutant should not bind to LOXL2. Our docking analysis indeed showed that none of the poses is impaired either by BD1 or BD2 mutagenesis (Fig. EV4D). The fact that JQ1 disrupts the interaction between BRD4S and LOXL2 (Fig. 2F, G) is not due to the fact that they compete for the same binding residue, but rather for the space occupied by JQ1 inside the AcK binding pocket of either BD1 or BD2, which impedes proper binding to LOXL2. Our pulldown data indeed showed that mutant BD1 and BD2 retain the ability to bind to LOXL2 (Fig. 2C), as predicted by the docking.

      We did not try to express constructs either lacking BD1 or BD2 and we cannot speculate what could happen to the BRD4S-LOXL2 interaction in this scenario. Even though this experiment could help dissect the interaction between LOXL2 and BRD4S, we decided to rather perform mutagenesis of specific residues that have been predicted to be important for the interaction. The reason why we did not include this study in the current manuscript, is that we started a parallel line of investigation aimed at identifying residues fundamental for the interaction that can be exploited in compound screening campaigns to identify molecules able to block the described interaction and thus cancer proliferation. Publishing these preliminary results at this stage could jeopardize the drug discovery campaign and we hope that the reviewer will understand our constraints.

      -If authors support that BRD4S is the predominant isoform driving the expression of DREAM-targets, this means that DREAM-targets are mainly bound by BRD4S, relying on Figure 3E-F. However, based on the author's ChIPseq tracks in Figure 3H, DREAM targets such as EZH2 and HMGB2 are co-occupied by both BRD4 isoforms at the basal state on their promoter region. Also, especially for EZH2 and PLK4, authors should set to 'group auto-scale' both conditions in a smaller scale range for ChIPseq- and RNAseq tracks, although I do not these two genes as good candidates representing your analysis. Therefore, authors should initially show all genes (e.g in a table format) that enrich the 'DREAM-targets' signature and select for a greater panel of genes (like for AURKB and HMGB2) demonstrating a preferential occupancy of the BRD4S at their promoter region. Finally, authors are recommended to perform a ChIP-qPCR on these genomic regions at basal state (no LOXL2 silencing) to validate the predominant occupancy of BRD4S and the low/absent occupancy of BRD4L at these genomic sites.

      We apologize for the confusion. To make the figure more understandable, we now scaled all the panels to the same scale and highlighted in grey the promoter region of each selected DREAM target gene. As the reviewer can appreciate, none of these genes is bound by BRD4L in basal conditions (Fig. 3F).

      To better characterize the differential binding, following the reviewer’s suggestion, we performed ChIP-qPCR using Ab2 (which recognizes both BRD4 isoforms), in cells either downregulated for BRD4L or BRD4S with isoform-specific shRNAs (Fig. EV5H). Results showed that only the downregulation of BRD4S reduced the binding of Ab2 to the promoter of the selected DREAM target genes (Fig. 3D), corroborating our hypothesis and validating our ChIPseq strategy.

      -Authors in Figure 3G should select an equal-sized population of randomly chosen non-DREAM-target genes, otherwise, the comparison of log2FC difference between these two gene cohorts is unreliable and difficult to make. Mann-Whitney test should also be performed.

      We thank the reviewer for this suggestion, which was added to the revised version of the manuscript (Fig. 3E, lower panel).

      -Authors should repeat the cell cycle analysis (Figure 4A) as the number of cells subjected to flow cytometry is quite discrepant between the conditions. Also, it is not clear if the experiment was performed in at least biological triplicates (although in the respective legend, it is stated so). If performed in biological triplicates, authors should make a new graph where each cell cycle phase cell population differs between the two conditions. Moreover, the difference in cell cycle defects in LOXL2-inhibited cells (Figure 4C) is indifferent compared to their control counterpart. Therefore, authors should address these inconsistencies.

      We thank the reviewer for the suggestion. In the revised version of the manuscript, we represent the cell cycle also as a bar plot with statistical analysis (Fig. 4A, C). Even though the number of cells was the same across conditions, the sub-G1 population of the LOXL2 KD cells may have distorted the profile of the cell cycle. To avoid misinterpretations, we repeated the analysis in the revised version of the manuscript. Statistical analysis supports that LOXL2 inhibition or downregulation has a significant effect on cell cycle progression (Fig. 4A, C, right panel).

      -Furthermore, authors should explain what was the rational selecting a mediator subunit and specifically MED1 as a possible interacting partner of LOXL2 and BRD4s since MED12 and MED24 were also highly essential (Figure 4F).

      We selected MED1 as a Mediator Complex proxy. In our essentiality analysis MED 1, 9, 10, 12, 15, 16, 19, 23, 24, 25 score as significant, suggesting a functional interaction between LOXL2 and the Mediator Complex, rather than a specific subunit. MED1 has been previously described as a BRD4 partner and it is often used in immunofluorescence to visualize transcriptional foci, which made it the best candidate for follow-up study in our project.

      -Moreover, do authors also observe this functional relationship of LOXL2 and BRD4S in cell cycle progression in other breast cancer subtypes presenting a high proliferation index e.g HER2+?

      Presumably, the author's proposed mechanism applies to a wide panel of breast cancer entities, for which, only key experiments could be performed.

      We thank the reviewer for the suggestion. We hypothesized that other cancer types expressing LOXL2 and BRD4S could also benefit from the combinatorial treatment. Indeed, the CCLE drug sensitivity panel in Fig. 1A comprises cancer cell lines of different origins, not just TNBC, and corroborates that the relationship between LOXL2 expression levels and BRD4 sensitivity exist also beyond TNBC. Even though it is important to experimentally verify this hypothesis, we decided to pursue it in the future to broaden the applicability of the proposed strategy in preclinical settings.

      -Authors in Figure 5H represent LOXL2 and BRD4s as integral chromatin looping factors together with MED1 at promoter and enhancer regions. However, this illustration is an overrepresentation of their finding because authors did not address the differential occupancy of BRD4S upon LOXL2 loss in DREAM-target-specific enhancer regions. If they wish to do so, they may use the RANK ORDERING OF SUPER-ENHANCERS (ROSE) package to call for super-enhancer regions in the proximity of DREAM-targets and confirm similar results as for their TSS-proximal sites.

      We thank the reviewer for the useful suggestion. In the new version of the manuscript, we have simplified the representation, which now does not show super-enhancers. However, following the reviewer’s suggestion, we performed super enhancer analysis using ROSE. Results showed that BRD4S binds to super-enhancers more than BRD4L, including DREAM target gene super-enhancers. Additionally, while LOXL2 KD did not alter the binding of LOXL2 to DREAM target gene super-enhancers, it decreased the binding of BRD4S to them (Fig. EV7D, E). Overall, these data are in agreement with our hypothesis that BRD4S together with LOXL2 controls the expression of DREAM target genes.

      -In the current manuscript, authors did not address the translational relevance of their proposed mechanism in the context of conventional therapies. Knowing that several BRD-specific compounds currently undergo clinical trials, authors should address if LOXL2 low (MDAMB468) and high (BT549) cells demonstrate a differential sensitivity to increasing doses of chemotherapy, in the presence or absence of BRD4. By doing that, LOXL2 apart from being a therapeutic target could be also used as a prognostic marker to stratify patients and achieve better response to standard therapies.

      We really appreciate the reviewer’s suggestion and we think this is a fundamental point. In the new version of the manuscript, we have performed further analysis using a greater panel of chemotherapeutic agents from the CCLE sensitivity database. We now show that LOXL2 low-expressing cells show significantly more sensitivity to BETi treatments, but not to conventional chemotherapeutic agents (e.g. doxorubicin, Olaparib, 5-fluorouracil, paclitaxel, etc.) (Fig. 1A and Fig. EV1), which set the rationale to further explore the functional relationship between BRD4 and LOXL2.

      Minor points:

      -In Figure 1D, the authors should convert the y-axis to a logarithmic scale to better represent the differences between JQ1, PXS, and combo. Also, One-way Anova should be performed between JQ1, PXS and combo.

      We don’t understand the reviewer’s suggestion since Fig. 1D (Fig. 6B, right panel in the revised version) is a tumor picture for which the y-axis cannot be converted to a logarithmic scale.

      -In Figure S6F, authors did not show the sensitivity of LOXL2 low and high cell lines for BRD4 KO. If LOXL2-proficient cells are less sensitive to JQ1, based on Figure 1B, authors should consider showing something similar from the gene essentiality database.

      We agree with the reviewer and we apologize for this mistake. We have included the sensitivity of LOXL2 low and high cell lines for BRD4 KO and also for MYC KO (Fig. EV6G).

      -Authors failed to discuss the work from Ozge Saatci et al (PMID: 32415208) regarding LOXL2 in TNBC and ECM reorganization as well as in other cancer entities (PMID: 35428659) in the context of ECM remodeling. Authors should realize that these published works and the current ones are not conflicting but complement each other.

      We thank the reviewer for the suggestion. In the revised version of the manuscript, we discussed this work.

      Reviewer #3 (Significance (Required)):

      SIGNIFICANCE

      The conception and findings are of enlightening significance for TNBC therapy, especially given the lack of targeted therapies in this particularly aggressive breast cancer subtype. Hence, I posit this work as highly relevant for the cancer epigenetics research community interested in characterizing unknown factors that facilitate the gene-activating function of epigenetic readers in health and disease.

      My field of expertise is to uncover epigenetic vulnerabilities responsible for transcriptional plasticity driving drug tolerance in aggressive forms of breast cancer.

      We would like to take the opportunity to thank the reviewer for the relevant suggestions. We strongly believe the revised version of the manuscript has been substantially improved by addressing the comments the reviewer made.

    1. Author Response

      Many thanks for the detailed and sometimes sharp, yet appropriate criticism of our study. It was an incentive for us to carry out additional analyses and to devote more effort to an elaboration of concepts. The outcome is that the results have changed slightly and that we now give more space to a discussion of concepts. We first address here the points raised by more than one reviewer before responding to comments contributed by individual reviewers.

      The points raised can be divided into three thematic groups, 1) conceptual issues, 2) experimental and analytical questions, and 3) comments challenging the novelty of our results. On the first theme, we think it is essential to make a clear distinction between the conceptual and observational domains. As such, the criteria defining a “mirror neuron” and what is meant by the term "mirror mechanism" belong to the conceptual domain. This understanding of terms requires agreement among scientists, but is not experimentally testable. Unfortunately, there is no agreement on how to define a “mirror neuron” and what is meant by “mirror mechanism”. Thus, for the present work, the only option is to refer to specific definitions or to use our own, definitions which try to capture what others, and here most importantly Rizzolatti and colleagues, probably meant. We have adjusted the introduction in an attempt to convey our understanding and usage of the two terms in a hopefully comprehensible manner. Briefly, we use a definition for "mirror neuron" that we take from the first paragraph of the results section of Gallese et al. (Brain, 1996). We do not consider the "properties of mirror neurons" described in that paper as defining a mirror neuron (MN). Classifying neurons as MNs only on the basis of the presence of a modulation of discharge rate during an executed and an observed action compared with a baseline is a common practice also in other single neuron studies on MNs, consistent with this definition. Regarding "mirror mechanism", we refer to Rizzolatti and Sinigaglia (2016) and make a distinction between a broad and a strict definition. Given our finding that there are almost no F5 MNs whose activity during observation is a motor representation according to our strict definition of a mirror mechanism, and also given the problem that the term “mirror mechanism” itself is not uniformly understood, the question arises whether and how the term "mirror neuron" should be used in the future. The answer to this may vary and belongs to the conceptual domain. We briefly address this question at the end of the discussion of the revised manuscript.

      From that understanding of terms, conceptual hypotheses are to be distinguished, which of course must allow experimental predictions, i.e., must be falsifiable. We now distinguish more clearly between a "representation hypothesis" and an "understanding hypothesis". Both hypotheses focus on F5 MNs and are based on the strictly defined mirror mechanism. We test the “representation hypothesis” in our study, and just because it is the basis for the “understanding hypothesis”, falsifying the “representation hypothesis” would allow us to conclude that the “understanding hypothesis” is not valid. In contrast, confirmation of the “representation hypothesis” would not, of course, allow us to conclude that the “understanding hypothesis” holds. That would really be circular reasoning (this conclusion was drawn by some and rightly criticized). However, support for the “representation hypothesis” would be the necessary prerequisite for the “understanding hypothesis” to be true. These two hypotheses take up the original argument that a certain understanding of observed actions could follow from an equality of action-specific F5 MN activity during execution and observation. Because we considered the data on equality of action- specific F5 MN activity to be insufficient, we designed this study. Since our result largely argues against the "representation hypothesis" and thus against the "understanding hypothesis," we now discuss alternative concepts for the function of F5 MNs in more detail. It should be noted here that our fourth concept ("goal-pursuit-by-actor") could well represent the observed action without contradiction to our broad definition of a mirror mechanism, which in principle could also serve a subjective experience (which could be conceived as a kind of understanding). The way we structure the concepts in the discussion of this revised manuscript is, in our opinion, a useful overview of the concepts. The third concept is new in this context. We would like to emphasize that we focus on F5 MNs and intentionally avoid a discussion of mirror neurons beyond F5 in this paper. With the data from this study, we cannot say anything about MNs outside of F5.

      Regarding the key question of how the "understanding hypothesis" is testable, or whether it may not be testable at all, we agree, of course, that for the conclusion of whether F5 MNs contribute to perception, only a manipulation of F5 MNs can clarify it. We now say that explicitly in the introduction. We agree with reviewer #2 that "understanding" here is not limited to "action recognition" or "action categorization”, which in principle could be implemented by purely sensory processing. Therefore, we also do not believe that the approach proposed by reviewer #3, which builds on the distinction of actions, would allow for a critical examination of the "understanding hypothesis”. But we disagree that the "understanding hypothesis" is not testable at all. Operationalization is necessary. If we accept that we can measure certain visual or auditory perceptions of an animal by operationalization (e.g., the subjective visual vertical, see for example Khazali et al., PNAS, 2020), then we must also accept that we can, in principle, measure other subjective experiences by operationalization, such as pain or aiming at a goal or even the co- experience of pain. An example of how to approach this is the study by Carrillo et al. (Curr Biol, 2019), which reviewer #2 and colleagues discussed in a recent review article (Bonini et al., TCS, 2022).

      With regard to the second theme, experimental and analytical questions, we noticed while reading the comments that in our first version we did not distinguish clearly enough between statements about single neurons and statements about populations of neurons. Therefore, we now clearly separate single neuron analysis and population code analysis in the structure of the article. In view of the fact that statements about mirror neurons in the literature mostly refer to single neurons, we added extensive single neuron analyses, so that only now statistically reliable statements about single neurons are possible. This has led to the realization that the number of neurons with exclusively shared code is so small that these neurons should be considered a rare exception. Given the small number of time periods with shared code, we additionally tested against a hypothesis already rightly proposed as an alternative explanation by G. Csibra in 2005 (Mirror neurons and action observation: Is simulation involved? In: What do mirror neurons mean? Interdisciplines Web Forum 2005). We were able to reject this hypothesis based on two of three methods for testing for a shared code. This is the second piece of evidence besides the clustering of time periods with shared code already described in the first version that time periods with shared code cannot be considered random.

      We discuss in more detail the question of whether neurons that exhibit a shared code at least at times support the representation hypothesis. To this end, we additionally examined whether certain action segments are more frequently represented with a shared than with a non-shared code, whether neurons with shared code differ from those with non-shared code in anatomical location, and whether an accuracy can be achieved with a time bin-wise selection of neurons with shared code by population cross-task classifiers as with within-task classifiers in the whole population.

      Another issue was how to test for shared code and how to decide if a code has enough sharing. To answer the question, the exact hypothesis we intended to test here is crucial. The representation hypothesis states that the representation of the observed actions in F5 MNs corresponds to the representation as it occurs during the execution of the same actions. Therefore, the relationship between discharge rate and actions that holds during execution should also hold during observation, which is measurable with a classifier trained on execution trials and tested on observation trials. Moreover, the actions should not be more distinguishable during observation with a classifier other than the execution-trained classifier, because if that were so, it would mean that the representation of observed actions is different from that of executed actions. The detection of a cluster of time bins for which both conditions are satisfied confirms that it is possible to discover in this way the shared codes postulated by the representation hypothesis.

      With respect to concerns that the monkey may not have used the cue at all when the action was executed, we added a comparison with control trials with a non-informative cue and also compared the duration of the approach phase between the three actions. Regarding oculomotor behavior, we verified that the monkey had actually directed his gaze toward the action during action observation for all three actions.

      On the third issue, concerning the novelty of our results, we have now explained in more detail in the introduction why we felt it necessary to conduct a study we considered fundamental. As a result of our study, it can be clearly stated now that representations of observed actions as predicted by the strictly defined mirror mechanism are rare in F5 MNs, but nevertheless cannot be dismissed as random. This dispels the objection rightly raised by Csibra in 2005 and contradicts the currently prevailing view that such a representation can only be found at a population level. Even if these representations are ultimately explained by a concept other than the strictly defined mirror mechanism, their existence must be accounted for by any theory of the function of F5 neurons. Moreover, it is also shown that the observed actions are well discriminated with a non- shared code, at times even optimally. This contradicts the notion – which has been widespread for a long time since the work of Gallese et al. (Brain, 1996) – that mapping to motor representations in terms of broad congruence is simply not perfect. The applied cross-task decoding approach seems promising to test also in the future for a shared action code. Finally, reconsideration of alternative concepts has led us to highlight the possibility of a representation of a goal pursuit by the observer.

      Reviewer #1 (Public Review):

      The authors set out to investigate the hypothesis that mirror neurons in ventral premotor area F5 code actions in a common motor representation framework. To achieve this, they trained a linear discriminant classifier on the neural discharge of three types of action trials and test whether the thus trained classifier could decode the same categories of actions when observed. They showed that codes were fully matched for a small subset of neurons during the action epoch, while a wider set of "mirror neurons" showed only poorly matched codes for different epochs.

      This is one of the descriptions of our results, where we realized that in our first version we did not distinguish clearly enough between statements about single neurons and statements about populations of neurons. This prompted us to perform a detailed single neuron analysis.

      The authors controlled for potential visual object confounds by having identical objects be manipulated in three different ways and by having the animal carry out the motor execution in the dark. The main strength of the study lies in the clever decoding approach testing the matched tuning to behavioural categories in a model-free way. The central result is in the identification of the small sub-group of mirror neurons that show true matching during the execution epoch, which can dissociate the three types of action almost perfectly. This aligns well with some previous work while offering a novel avenue to identify and investigate those neurons. The underlying neuronal mechanism and behavioural relevance of these neurons remain an open question. It would have been interesting to understand better whether the specific motor representations at a recording site, for instance identified through microstimulation prior to recording (see Methods), the reaction times on individual trials or the specific gaze targets (object/hand) had a bearing on the decoding performance for a neuron/trial.

      We agree that these are interesting questions.

      In this study, the focus is on testing for a shared code according to a strictly defined mirror mechanism. We have now compared the anatomical locations of neurons with only time bins in which observed actions were discriminated with a shared code (according to one of the methods) to the locations of neurons with only time bins with non-shared code (see last paragraph in Results). We did not find any relevant difference and this is why one cannot expect topographically specific effects of microstimulation.

      We do not expect the reaction time (i.e., the time interval between LED onset and start button release, or the duration of the approach epoch) during execution or observation to have any effect on our results on shared coding as the analysis was based on relative time bins. The observed actions were predominantly distinguished late in the approach epoch, but especially in the manipulation epoch. At this time, reaction time is not expected to have a relevant influence.

      The relationship between gaze/eye position and the activity of mirror neurons, during execution or observation, is an interesting topic in itself. However, for testing for a shared code according to a strictly defined mirror mechanism, it is only relevant that the observing monkey actually observes the action. We have ensured this in our experiment by a fixation window and have now also confirmed that the monkey actually looked into the area of the object during all three actions (see Results, lines 209-219 in the manuscript with tracked changes).

      Ultimately, the uncovered matched mirror representations should in future experiments be tested with causal interventions and linked trial-by-trial to action selection performance.

      The authors put the focus of their discussion on the wider, less well-matched neuronal pool to support an action selection framework, which is of course a valid view and well established in motor representations. From a sensory perspective, sparse coding, as suggested by the small group of "true" mirror neurons identified with the decoding approach, should also be considered as the basis for a possible neuronal mechanism. A particular strength of the paper is that it could give new data and impetus to the important discussion about how motor and sensory coding frameworks come together in cortical processing.

      We have expanded the discussion considerably and also address the possibility of sparse coding.  

      Reviewer #2 (Public Review):

      The paper by Pomper and coworkers is an elegant neurophysiological study, generally sound from a methodological point of view, which presents extremely relevant data of considerable interest for a broad audience of neuroscientists. Indeed, they shed new light on the mirror mechanism in the primate brain, trying to approach its study with a novel paradigm that successfully controls for some important factors that are known to impact mirror neuron response, particularly the target object. In this work, a rotating device is used to present the very same object to the monkey or the experimenter, in different trials, and neurons are recorded while the monkey (motor response) or the experimenter (visual response) performed a different action (twist, shift, lift) cued by a colored LED.

      The results show that there is a small set of neurons with congruent visual and motor selectivity for the observed actions, in line with classical mirror neuron studies, whereas many more cells showed temporally unstable matched or even completely non-matched tuning for the observed and executed actions. Importantly, the population codes allow to accurately decode both executed and observed actions and, to some extent, even to cross-decode observed actions based on the coding principles of the executed ones.

      In my view, however, the original hypothesis that an observer understands the actions of others by the activation of his/her motor representations of the observed actions constitutes circular reasoning that cannot be challenged or falsified, as the author may want to claim. Indeed, 1) there is no causal evidence in the paper favoring or ruling out this hypothesis (and there couldn't be), 2) there is no independent definition (neither in this paper nor in the literature) of what "action understanding" should mean (or how it should be measured). Instead, the findings provide important and compelling evidence to the recently proposed hypothesis that observed actions are remapped onto (rather than matched with) motor substrates, and this recruitment may primarily serve, as coherently hypothesized by the authors, to select behavioral responses to others (at least in monkeys).

      1) One of the main problems of this manuscript is, in my view, a theoretical one. The authors follow a misleading, though very influential, proposal, advanced since the discovery of mirror neurons: if there are (mirror) neurons in the brain of a subject with an action tuning that is matched between observation and execution contexts, then the subject "understands" the observed action. This is clearly circular reasoning because the "understanding" hypothesis uniquely derives from the neuron firing features, which are what the hypothesis should explain. In fact, there is no independent, operational definition of the term "understanding". Not surprisingly there is no causal evidence about the role of mirror neurons in the monkey, and the human studies that have claimed to provide causal evidence of "action understanding" ended up using, practically, operational definitions of "recognition", "match-to-sample", "categorization", etc. Thus, "action understanding" is a theoretical flaw, and there is no way "to challenge" a theoretical flaw with any methodologically sound experiment, especially when the flaw consists of circular reasoning. It cannot be falsified, by definition: it must simply be abandoned. On these bases, I strongly encourage the authors to rework the manuscript, from the title to the discussion, by removing any useless attempt to falsify or challenge a circular concept and, instead, constructively shed new light on how mirror neurons may work and which may be their functional role.

      Please see the response to all.

      2) An important point to be stressed, strictly related to the previous one, concerns the definition of "mirror neuron". I premise that I am perfectly fine with the definition used by the authors, which is in line with the very permissive one adopted in most studies of the last 20 years in this field. However, it does not at all fulfill the very restrictive original criteria of the study in which "action understanding" concept was proposed (see Gallese et al. 1996 Brain): no response to object, no response to pantomimed action or tool actions, activation during execution in the dark and during the observation of another's action.

      We do not agree that the enumerated "very restrictive original criteria" emerge from the Gallese et al. (Brain, 1996) study. Except for the first paragraph in the results section, there is no clear statement on how mirror neurons should be defined.

      If the idea (which I strongly disagree with) was to simply challenge a (very restrictive) definition of mirroring (a very out-of-date one, indeed, and different from the additional implication of "action understanding"), the original definition of this concept should be at least rigorously applied. In the absence of additional control conditions, only the example neuron in Figure 2A could be considered a mirror neuron according to Gallese et al. 1996.

      We have the impression that the question does not distinguish clearly enough between the definition of "mirror neuron" and the definition of "mirror mechanism". In defining "mirror mechanism", we refer to the work of Rizzolatti and Sinigaglia (Nat Rev Neurosci, 2016). We do not think that this definition is out-of-date (see for example the 2018 article by Rizzolatti and Rozzi in Handbook of Clinical Neurology). If the term "mirror mechanism" is to be defined differently, then another term should be used for a new definition or an annotation should be added (such as "version 2"). This would be necessary to avoid unnecessary confusion resulting from unclear terms.

      Permissive criteria implies that more "non-mirror" neurons are accepted as "mirror": simply because they are permissively named "mirror", does not imply they are mirroring anything as initially hypothesized

      Even for a neuron that would be classified as a "mirror neuron" according to your previously stated "very restrictive original criteria”, it does not follow that it "mirrors” according to a mirror mechanism. And, of course, it is quite possible that more neurons do not "mirror” according to a mirror mechanism if one tests more neurons.

      (Example neuron in Fig 2B, for example, could be related to mouth, rather than hand, movements, since it responds strongly and similarly around the reward delivery also during the observation task, when the monkey should be otherwise still).

      We agree, it is not excluded that this neuron has a relation to mouth movements. However, since the neuron meets the conditions to be classified as a "mirror neuron", an additional relation to mouth movements would not be relevant. If mouth movements are to be an exclusion criterion, then this would have to be included and justified in the definition of a "mirror neuron".

      Clearly, these concerns impact all the action preference analyses. To practically clarify what I mean, it should be sufficient to note that 74% (reported in this study) is the highest percentage ever reported so far in a study of neurons with "mirror" properties in F5 (see Kilner and Lemon 2013, Curr Biol) and it is similar to the 68% recently reported by these same authors (Pomper et al. 2020 J Neurophysiol) with very similar criteria. Clearly, there is a bias in the classification criteria relative to the original studies: again, no surprise if by rendering most of the recorded neurons "mirror by definition" then they don't "mirror" so much. I suggest keeping the authors' definition but removing the pervasive idea to challenge the (misleading) concept of understanding.

      We think that it is very important to clearly separate "mirror neuron" from "mirror mechanism". And the question arises whether one should not include a mirroring criterion, which is derived from a definition of a mirror mechanism, in the definition of mirror neurons. We address this briefly in the discussion. Ultimately, the point of our study is to find out how many of the - if you want to put it that way - "permissively defined" mirror neurons actually “mirror”. And the answer depends on how one defines “mirror mechanism”. We provide an answer by resorting to a “strictly defined mirror mechanism”. We have now also given throughout the results section the percentages of neurons with certain properties with respect to all measured F5 neurons. This is a reference that allows comparisons among studies, provided that no neurons were directly discarded during recording, which we avoided in our study.

      3) It would be useful to provide more information on the task. Panel B in Figure 1 is the unique information concerning the type of actions performed by the monkey and the experimenter. Although I am quite convinced of the generally low visuomotor congruence, there are no kinematics data nor any other evidence of the statement "the experimental monkey was asked to pay attention to the same actions carried out by a human actor". First, although the objects were the same, the same object cannot be grasped or manipulated in the same way by a human and a macaque, even just because of the considerable difference in the size of their hands; this certainly changes the way in which monkeys' and experimenter's hands interact with the same object, and this is a quantifiable (but not quantified) source of visuomotor difference between observed and executed actions and a potential source of reduced congruency.

      We agree, of course, that there are kinematic differences in how a monkey and how a human manipulate the same object. We have not measured the kinematics and thus cannot make a systematic statement about this. We now report in the results section the rather incidental observation that already the reaching trajectories for the three actions differed and show corresponding differences in the timing of the approach epoch. However, for the question of this study, how many neurons are eligible to represent observed actions according to a strictly defined mirror mechanism, the kinematic repertoire of the observed actor is irrelevant. The reference is the F5 mirror neuron activity during the monkey's own action, i.e., how the monkey approaches the object with his hand, how he grasps it, and how he brings it to a certain target position and holds it there. The observed action, according to the strictly defined mirror mechanism, is to be mapped to this reference. Therefore, we did not collect kinematic data. But it is of course a possible explanation for a non-shared code if the strictly defined mirror mechanism does not apply.

      Second, there is little information about monkey's oculomotor behavior in the two conditions, which is known to affect mirror neuron activity when exploratory eye movements are allowed (Maranesi et al. 2013 Eur J Neurosci), potentially influencing the present findings: a {plus minus}7 (vertical) and {plus minus}5 (horizontal) window at 49 cm implies that the monkey could explore a space larger than 10 cm horizontally and 14 cm vertically, which is fine, but certainly leaves considerable freedom to perform different exploratory eye movements, potentially different among observed actions and hence capable to account for different "attention" paid by the monkey to different conditions and hence a source of neural variability, in addition to action tuning.

      We agree that the topic of the relationship between F5 MNs activity and eye movements is interesting. And we know from the work of Maranesi et al. (2013) that at least larger eye movements during action observation are related to the activity of F5 MNs. In our study, we ensured that the observing monkey was actually observing the action. For this purpose, we used a fixation window. We now additionally verified that the monkey really looked into the area of the object during all three actions (see Results, lines 209-219 in the manuscript with tracked changes). In our study, the fixation window was so small that the monkey could not see the face of the human actor, in contrast to the study of Maranesi et al. (2013). It was mainly the face that attracted the monkey's attention in that study (measured by gaze position). In our study, the risk that the gaze of observing monkey was out of the fixation window was high when he looked at the human actor's hand above the wrist. The execution of the action by the monkey took place in darkness. We did not use a fixation window because the monkey's own execution of the action can be assumed to direct his attention to the action.

      We cannot rule out the possibility that smaller eye movements during observation, larger eye movements during execution in darkness, covert shifts of spatial attention, or more generally attentional fluctuations have an influence on F5 MNs that might have counteracted a shared action code in our study. However, if this were the case, then the investigated hypothesis that the activity of F5 MNs during action observation is a motor representation according to the strictly defined mirror mechanism would also have to be rejected.

      4) Information about error trials and their relationship with action planning. The monkey cannot really "make errors" because, despite the cue, each object can be handled in a unique way. The monkey may not pay attention to the cue and adjust the movement based on what the object permits once grasped, depending on online object feedback. From the behavioral events and the times reported in Table 1, I initially thought that "shift" action was certainly planned in advance, whereas "lift" and "twist" could in principle be obtained by online adjustments based on object feedback; nonetheless, from the Methods section it appears that these times are not at all informative because they seem to depend on an explicit constraint imposed by the experimenters (in a totally unpredictable way). Indeed, it is stated that "to motivate the monkey even more to use the LED in the execution task, another timeout was active in 30% (rarely up to 100%) of trials for the time period between touch of object to start moving the object: 0.15 (rarely 0.1) for a twist and shift, 0.35 (rarely 0.3s) for a lift". This is totally confusing to me; I don't understand 1) why the monkey needed to be motivated, 2) how can the authors be sure/evaluate that the monkeys were actually "motivated" in this way, and 3) what kind of motor errors the monkey could actually do if any. If there is any doubt that the monkeys did actually select and plan the action in advance based on the cue, there is no way to study whether the activity during action execution truly reflects the planned action goal or a variety of other undetermined factors, that may potentially change during the trials. Please clarify.

      It is true that the three actions could in principle be performed without using the LED as an informative cue. While this is unlikely under the assumption that a monkey prefers the easiest and fastest way to get reward, it remains a possibility. For this reason, we introduced time constraints in a part of the trials. The selection of time constraints and the proportion of trials in which they were applied, was a pragmatic compromise between a time limit, at which the LED must be used as an informative cue for action selection in order to comply with the task, and a time span that allows the task to be completed even when overall motivation is low. The latter takes into account the general experimental experience that a monkey's engagement or motivation in such experiments varies across trials, sessions, and days. To evaluate whether the LED color was, indeed, used as a cue for action planning in the execution task, we randomly interleaved trials with a different LED, non-informative regarding the type of object, as a control in 5% of the trials. We compared the behavioral responses in trials with informative cues and those with a non-informative cue. The behavioral analysis established that both monkeys indeed used the informative cues to guide their choices (see Fig. 1D).

      Further evidence that the monkey used the cue for action selection and planning is the finding that the type of action was encoded before the release of the start button and then further during the approach phase, i.e., much earlier than somatosensory feedback about the manipulability of the object was available (see Fig. 3A and Fig. 6A).

      Regarding the question, which "motor errors" were possible: The answer can be found in the description of the cases in which a trial was aborted (see Material and methods): releasing the start button too early (< 100 ms after turning on the LED), manipulating the object too slowly after touching it (the time constraints mentioned), not holding the object until the reward was given, or not performing the task at all (10 s timeout).

      5) Classification analysis. There seems to be no statistical criterion to establish where and when the decoding is significantly higher than chance: the classifier performance should be formally analyzed statistically. I would expect that, in this way, both the exe-obs and the obs-exe decoding may be significant. Together with the considerations of the previous point 2 about the permissive inclusion criteria for mirror neurons, this is a remarkable (even quite unexpected) result, which would prove somehow contrary to what the authors claim in the title of the paper. The fact that in any classification the "within task" performance is significantly better than the "between task" performance does not appear in any way surprising, considering both the inclusive selection criteria for "mirror neurons" and the unavoidably huge different sources of input (e.g. proprioceptive, tactile, top-down, etc. afferences) between execution and observation. So, please add a statistical criterion to establish and show in the figures when and where the classifications are significantly above chance.

      We have added - in addition to the statistics already performed in the first version (Fig. 3A in the previous version, now Fig. 6A) - a number of analyses including statistics. This mainly concerns the analyses regarding a shared code at the single neuron level, in which we additionally tested against the null hypothesis proposed by Csibra in 2005 using permutation tests. And we have now also calculated confidence intervals for the population classifications that allow the comparison with chance level. We re-performed the classification analyses using eight-fold cross-validation. We also added a statistical analysis to the finding of clustering of time periods with shared code (Fig. 4). In Figure 5, we additionally compared the frequency of action segments with shared and non-shared codes, which is a descriptive, exploratory analysis. For this reason, it does not make sense to perform inferential statistics. Overall, these analyses represent a significant expansion of the analyses in the first version. We have done this primarily to arrive at statistically sound conclusions at the single neuron level.

      Regarding the comparison between within-task classification (o2o) and cross-task classification (e2o), it is important to keep in mind that the goal was to test the hypothesis that the activity of F5 MNs during action observation is a motor representation of the observed action according to the strictly defined mirror mechanism. This hypothesis requires both, 1) an above chance level accuracy of the e2o classifier and 2) no better accuracy of the o2o classifier as compared to the e2o classifier. If the o2o classifier were better, then the actions would not be represented as they are executed. And the reference in this hypothesis is the motor representation, that is, the code at execution. Thus, the direction e2o classification is the crucial one, not the reverse direction (o2e). One explanation for the fact that o2o shows better accuracy in the population may be the different sensory inputs mentioned above. In this case, the tested hypothesis has to be rejected and replaced by another one, which should then have a different name.

      Nevertheless, we also show the result of the o2e cross-task classification in Fig. 6 (yellow curve), which was already included in Fig. 3 of the first version. However, we do not address it in more detail in the main text because it is not relevant for the hypothesis to be tested. It is only a reportable additional result.

      6) "As the concept of a mirror mechanism posits that the observation performance can be led back to an activation of a motor representation, we restricted this analytical step to a comparison of the exe-obs and the obs-obs discrimination performance". I don't understand the rationale of this choice. The so-called "concept" of mirror mechanism in classical terms posits that mirror neurons have a motor nature and hence their functioning during observation should follow the same principle as during action execution. But this logical consideration has never been demonstrated directly (it is indeed costated by several papers), and when motor neurons are concerned (e.g. pyramidal tract neurons, see Kraskov et al. 2009) their behavior during action observation is by far more complex (e.g. suppression vs facilitation) than that hypothesized for classical "mirror neurons". Furthermore, when across-task decoding for execution and observation code has been used, both in neurophysiological (e.g. Livi et al. 2019, PNAS) and neuroimaging (Fiave et al. 2018 Neuroimage) data, the visual-to-motor direction typical produce better performance than the opposite one. Thus, I don't see any good reason not to show also (if not even just) the obs-exe results. Furthermore, I wonder whether it is considered the possible impact of a rescaling in the single neuron firing rate across contexts, as the observation response is typically less strong than the execution response in basically all brain areas hosting neurons with mirror properties, and this should not impact on the matching if the tuning for the three actions remains the same (e.g. see Lanzilotto et al. 2020 PNAS). The analysis shown in Figures 4 and 5 is, for the rest, elegant and very convincing - somehow surprising to me, as the total number of "congruent" neurons (7.5%) is even greater than in the original study by Gallese et al. (5.4%).

      As to the rationale of our approach, please see our response to the previous point.

      On the issue of rescaling: the hypothesis tested here requires that the F5 MNs activity on observation is a motor representation of the observed action. Hence, from the activity during observation the action should be just as readable as from the execution-related activity. If we had to use rescaling to find a shared code, then observed actions would not be represented in F5 MNs in the same way as on execution. Additional information on whether the action is being executed or observed would be needed. This would of course be possible in principle, but would contradict the hypothesis. And we then not only have the difficulty of which readout is the physiological one (here we make a parsimonious assumption with a linear readout), but we would have to make an additional assumption about rescaling. For this study, we have now chosen the solution of performing the action preference analysis on a single neuron level in a statistically clean way. This represents a very liberal form of rescaling, as it only tests whether the action with the highest or lowest discharge rate is the same when executed and observed. That is, if the result here is not fundamentally different, which is the case, then it can also be assumed that one does not get qualitatively different results for other forms of rescaling.

      7) The discussion may need quite deep revision depending on the authors' responses and changes following the comments; for sure it should consider more extensively the numerous recent papers on mirror neurons that are relevant to frame this work and are not even mentioned.

      The discussion has been thoroughly revised considering the comments raised and suggestions of this and the other two reviewers.

      Reviewer #3 (Public Review):

      Mirror neurons are a big deal in the neuroscience literature and have been for thirty years. I (and many others) remain skeptical of whether they serve the functions often attributed to them - specifically, whether they are motor planning neurons that contribute to understanding the actions of others. Testing their functions, therefore, is of great interest and importance. The present study, however, is not a cogent or convincing test. I do not think this study helps to answer the questions surrounding mirror neurons. It purports to provide a crucial test, that comes out mostly against the mirror neuron hypothesis, but the test has too many weaknesses to be convincing.

      Thank you for the clear words. We take from it, first of all, that in the first version of the manuscript we failed to convey the relevance of our study for the discussion of mirror neuron function. The concerns of this reviewer are in line with those of the others and are addressed in our response to all three reviewers.

      First, consider that the motor tuning and the visual tuning match "poorly." How poor or good must the match be before the mirror neuron hypothesis is rejected? I do not know, and the study does not help here. Even a "poor" match could contribute significantly to a social perception function.

      The specific hypothesis tested here assumes that an action-specific activity of F5 MNs evoked by observed actions corresponds to an action-specific activity of these actions if executed. The approach taken here to compare cross-task classification accuracy (execution-trained, tested in observation) with within-task classification accuracy (observation-trained, tested in observation) tests this hypothesis. The fact that we found a cluster of time periods of single neurons in which both accuracies are almost equal supports this approach and also the hypothesis for these time periods. In principle, of course, the decision for the presence of a difference or equality is always only a statistical statement and contains assumptions. For example, the assumption that a linear readout has physiological relevance enters here. But this problem exists in all studies that ultimately try to understand biological neuronal networks in order to explain perceptions and behavior. However, it is such studies that attempt to elucidate what information is contained in which neurons that set the stage for experiments that, in the optimal case, manipulate certain neurons in a particular way in order to then measure the behavior of an animal that is just right for those neurons.

      Second, the results remind me in some ways of other multi-modal responses in the brain. For example, in the visual area MST, neurons are tuned to optic flow fields that imply specific directions of self-motion. Many of the same neurons are tuned to vestibular signals that also imply specific directions of self-motion. But the optic flow tuning and the vestibular tuning are not perfectly matched. There is considerable slop and complexity in how the two tunings compare within individual neurons. That complexity is not evidenced against multi-modal tuning. Instead, it suggests a hidden-layer complexity that is simply not fully understood yet. Just so here, the fact that the apparent motor tuning and apparent visual tuning match "poorly" is not evidence against both a motor planning and a visual encoding function.

      We hope that it is now clearer, in contrast to the first version, that we tested a specific hypothesis that is only a prerequisite for the hypothesis of a very specific form of understanding. Referring to the example, the hypothesis analogous to ours would be that the representation of self-motion direction due to optic flow ("observation") corresponds to the representation of self-motion direction due to vestibular stimulation ("execution"). If it were then found that the self-motion direction due to optic flow cannot be predicted from a classifier trained on vestibular stimulation, and that another classifier trained on optic flow performs better, then the hypothesis would have to be rejected. This is then a reason to realize that "everything is a bit more complex" and to search for better explanations.

      Third, the animals are massively over-trained in three actions. They perform these actions and see them performed thousands of times toward the same object. Surely, if I were in the place of the monkey, every time I saw the object, I'd mentally imagine all three actions. As I saw a person act on the object, I'd mentally imagine the alternative two actions at the same time. Even if the mirror neuron hypothesis is strictly correct, this experiment might still find a confusion of signals, in which neurons that normally might respond mainly to one action begin to respond in a less predictable way during all three trial types.

      In our study, we tested a specific hypothesis related to the time an action is observed. Here, you suggest an alternative hypothesis. The question is whether this alternative hypothesis better explains the result of our study. The alternative hypothesis can be formulated as follows: the F5 MNs activity elicited by an observed action in this experiment corresponds to a mixture of the activities that occur when the other two actions are executed. This hypothesis is to be rejected because it fails to explain why a shared code occurs in single neurons and why cross-task population classifiers show an accuracy above chance level. A modified alternative hypothesis, which states that what is represented in the experiment during observation is a mixture of all three actions, cannot explain why the three actions are very well represented in the population and are optimally represented exactly when the target position of the object is reached.

      Fourth, the experiment relies on a colored LED that acts as an instructional cue, telling the monkey which action to perform. What is to stop the neurons from developing a cue-sensitive response, as in classic studies from Steve Wise and others in the premotor cortex? Perhaps the neuronal signal that the experimenters are trying to measure is partly obscured by other, complex responses influenced in some manner by the instructional cue?

      In principle, there is the possibility that purely sensory information is also represented in area F5, at least in some neurons or at certain points in time. We take your suggestion and discuss this as one of the alternative concepts (we call it "sensory concept"). However, several findings argue against this concept. For example, neural responses to cues usually represent the subsequent action, but not sensory information of the cue such as the color of the cue. In our study, it is evident from Figure 3A, 6A and 6B that during action execution, actions are discriminated even before the start button is released. Since this discrimination of actions occurs with a time delay after the cue and then increases continuously, this is evidence that the action to be executed is represented, but not the cue itself.

      Fifth, finally, and most importantly, the fundamental problem with this study is that it is correlational. Studies that purport to test the function of a set of neurons, and do so by use of correlational measurements, cannot provide strong answers. There are always half a dozen different interpretations and caveats, such as the ones I raised here. Both sides of a debate can always spin the results, and the arguments are never resolved. To test the mirror neuron hypothesis properly would require a causal study. For example, lesion area F5 and test if the monkey is less able to discriminate the actions of others. Or, electrically microstimulate in area F5 and test if the stimulation interferes (either constructively or destructively) with the task of discriminating the actions of others. Only in this way will it be possible to answer the question: do mirror neurons functionally participate in understanding the actions of others? The present study does not answer that question.

      We would like to reiterate that studies aimed at elucidating what information is contained in which neurons or areas are necessary to understand neural network processes and are a prerequisite for conducting well-considered experiments that measure behavioral effects through specific manipulation of the neural network. Without the work of Gallese, Rizzolatti and colleagues, the idea of associating F5 neurons with action understanding would not have occurred in the first place. The current tricky question is whether at all, and if so, to what understanding, to what perception, to what behavior that uses information about mental states of another, F5 MNs might be able to contribute. And for this, it helps to have a clearer idea of what information is contained in F5 MNs during action observation.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Septate junctions provide the barrier function in insect tissues, serving as analogs to the vertebrate tight junctions. Here the authors explore an interesting question-how do epithelial tissues respond to loss of barrier function in vivo. They use a powerful and well-studied system, the Drosophila pupal notum, which allows them to bring powerful genetic tools to bear and use state of the art imaging. Their data are lovely and carefully quantified. Together, they reveal some significant surprises. 1. Disrupting septate junctions leads to elevated accumulation of adherens junction proteins and myosin, and reduced apical area. 2. Disrupting septate junctions led to accumulation of many ESCRT-0-positive vesicles and of enlarged ESCRTIII vesicles. 3. Disrupting septate junctions led to elevated accumulation of Crumbs apically and of integrin-based focal adhesions basally. These observations are well supported by the data and in the results section conclusions are carefully drawn. I had some relatively minor comments outlined below about the results. My only significant suggestion concerns the Abstract and Discussion. The Abstract includes a statement that goes well beyond the data shown, and the Discussion is sometimes hard to follow. With these issues corrected, this will provide important new insights for cell and developmental biologists.

      1. The Abstract states: "We report that the weakening of SJ integrity, caused by the depletion of bi- or tricellular SJ components, reduces ESCRT-III/Vps32/Shrub-dependent degradation and promotes instead Retromer-dependent recycling of SJ components." This is too strong, as the role of the retromer, while plausible, is not directly tested. It's fine to speculate about this in the Discussion but drawing a conclusion like this in the Abstract is unwarranted.
      2. Similarly, the title suggests that "ESCRT-III-dependent adhesive and mechanical changes are triggered by a mechanism sensing paracellular diffusion barrier alteration". They show that knocking down septate junctions alters localization of vesicle trafficking machinery, and that it leads to alterations in apparent recycling of cargo, but do they ever really assess whether these changes are ESCRT-III-dependent? Wouldn't this require knocking down ESCRT-III in cells with defects in septate junctions? There was a lot of data in this paper and perhaps I missed it but was this experiment done? I am not suggesting they do it, but that they temper this conclusion if not.
      3. The authors assessed "poly-ubiquitinylated proteins aggregates appearance, marked using anti-FK2" . They need to define FK2-what does it detect.
      4. Fig 4-is this a clone, and are we far from the boundary? Make this clearer
      5. The authors state: "Despite these apparent similarities, we noticed that, in contrast to Shrub depletion, NrxIV did not accumulate in enlarged intracellular compartments upon Cora depletion" Could the authors reference a Figure here?
      6. The authors state: "Hence, if both Shrub and bSJ/tSJ defects lead to Crumb enhanced signals" It might be better to say "altered" as they then point out the differences.
      7. I found the Discussion challenging to follow. Rather than focusing on the core observations, it addresses many, not very well-connected speculative possibilities, and in my opinion, will be challenging for most readers to follow. I would encourage the authors to revisit it from top-to-bottom.

      Referees cross-commenting

      I think we largely agree that the authors present important data, but that certain points need to be better explained or more clearly documented. While Reviewer 1 is correct that adding context about the basolateral polarity proteins would be helpful, I do not feel as strongly about this as a deficit. The authors did not manipulate Scrib, Dlg or Lgl, and i think their polarity functions may be distinct from those of the more "structural" septate junction proteins analyzed here.

      Significance

      Septate junctions provide the barrier function in insect tissues, serving as analogs to the vertebrate tight junctions. Here the authors explore an interesting question-how do epithelial tissues respond to loss of barrier function in vivo. They use a powerful and well-studied system, the Drosophila pupal notum, which allows them to bring powerful genetic tools to bear and use state of the art imaging. Their data are lovely and carefully quantified. Together, they reveal some significant surprises. 1. Disrupting septate junctions leads to elevated accumulation of adherens junction proteins and myosin, and reduced apical area. 2. Disrupting septate junctions led to accumulation of many ESCRT-0-positive vesicles and of enlarged ESCRTIII vesicles. 3. Disrupting septate junctions led to elevated accumulation of Crumbs apically and of integrin-based focal adhesions basally. These observations are well supported by the data and in the results section conclusions are carefully drawn. I had some relatively minor comments outlined below about the results. My only significant suggestion concerns the Abstract and Discussion. The Abstract includes a statement that goes well beyond the data shown, and the Discussion is sometimes hard to follow. With these issues corrected, this will provide important new insights for cell and developmental biologists.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this paper, using the triploid biotype of planarian Schmidea polychroa, the first half of the paper presents the results of the analysis of genome structure and the second half shows that (de novo) mutations in individuals that undergo regeneration are passed on by the next generation.

      While I think this paper contains interesting biological findings, I am skeptical about its novelty. I was convinced by the results and discussion of the analysis of genome structure, but the results and that of the analysis of (de novo) mutation were very confusing. This may be due to my lack of knowledge in this field. But even so, the author needs to improve this manuscript so that the general reader will better understand it.

      Major comments:

      1. The author mentions that it is important to note that this study was conducted using a parthenogenetic triploid biotype. However, I think that the parthenogenesis undergoing by a triploid biotype of S. polychroa is very unusual. It is not typical apomictic parthenogenesis. Triploid oocytes arise by meiosis from hexaploid oocytes derived from triploid adult somatic stem cells called neoblasts. On the other hand, haploid sperm arise by meiosis from diploid spermatogonia derived from neoblasts. Embryogenesis of triploid eggs then occurs by pseudogamy. Occasional sex is also known to occur even if the offspring's chromosome number remains triploid. I think this background is important information to give the reader. Also, don't the authors need to treat the results in this paper with this complex phenomenon also taken into account?
      2. Fig.4B-C: Analysis by lineage-specific mutations of parental controls.<br /> The authors do not specifically mention or discuss this result. What about the accumulation of mutations within such populations in typical parthenogenesis (daphnia and aphids)? In other words, are the results in Fig. 4B-C due to the special mechanism for parthenogenesis in the triploid S.polychroa as described above?
      3. Throughout this paper, the authors show that regeneration increases de novo mutations in the progeny. The authors conclude that many of the mutations occurred in neoblasts during regeneration. However, I would like you to explain the biological significance of this results in S. polychroa, which naturally does not reproduce by fission and regeneration. There are already reports of mutations accumulating in neoblasts in Dugesia japonica, which reproduce aexually by fission. For these reasons, I do not think this paper presents extremely novel results.
      4. p15, Discussion:<br /> "Tissue regeneration is best seen in the liver of mammals, and the regrowth of relapsed tumours following surgery can also be considered an example of a regenerative process. Mutagenesis accompanying these processes is relevant to subsequent tumorigenesis or the development of resistance, and the planarian system can provide a useful model for the mutagenic effect of tissue regeneration."

      Isn't it an overstatement to associate the regenerative system of planaria with the liver regeneration of mammals?<br /> 5. p10, Results:<br /> "We compared the two de novo spectra to the spectrum of germline heterozygous SNPs, present in all animals, and found that the pattern of germline substitutions resembled more closely the de novo spectrum of the control group (Fig 5D, Fig S3), implying that regeneration has a minor contribution to germline mutations in S. polychroa populations."<br /> p14, Discussion:<br /> "The high similarity of the spectrum of heterozygous SNPs and de novo mutations of control animals suggests that the species primarily reproduces in a non-regenerative manner. The increased mutation rate and the altered mutation spectrum upon regeneration confirmed our hypothesis that regeneration is a mutagenic process."

      I was very confused by these sentences and it took me some time to understand them. Triploid S. polychroa naturally does not reproduce by fission and regeneration, namely a non-regenerative manner. I do not understand why the author insists on this. Please explain the results for the regenerated case in Fig. 5D (0.88) in a way that is also easy to understand. Also, what is the biological significance of asserting here that de novo mutation by regeneration increases in a species that does not increase by regeneration and division in the first place?

      Minor comments:

      1. The author should add a schematic diagram showing the distribution of reproductive organs in Fig.1 to help the reader understand that the ovaries are not included in the regenerative fragment.
      2. P12, line12: Fig 6D-E, it's F, not E, right?
      3. P9, line 8:<br /> "these mutations were missing from the original egg but were present in the egg laid by the parent and thus represent the total mutation load of a generation."

      The author mentions that the de novo mutation found in offspring derived from parents that do not undergo regeneration was already present in the eggs, but I can find no evidence of this. Can you rule out the possibility that these mutations occurred between hatching and adulthood?<br /> 4. p10, Results:<br /> "Interestingly, the majority of mutations were shared in the siblings F4A and F4B. This suggests that the germ cells of these animals were descendants of the same stem cell, which underwent a high number of cell divisions early during the regeneration process prior to oocyte differentiation. The same finding also confirms that the detected clonal filial mutations were present in the respective oocyte and were not generated by embryonic cell divisions."

      The shared de novo mutations detected in the siblings (F4A and F4B) derived from the parent that underwent regeneration in Fig. 5A suggest that the germ cells of these siblings are descended from the same stem cell. The authors say that these mutations occurred in a large number of cell divisions early in the regenerative process prior to oocyte differentiation.

      So why is there no shared de novo mutation in the siblings (Fc4A and Fc4B) derived from the non-regenerating parent in Fig. 5A? As mentioned in Minor comment 3, the author states that the de novo mutations were already present in the parent-laid eggs, but when did these mutations, which are not shared, arise?<br /> 5. p11, Results:<br /> "Interestingly, in the case of FR4A-FR4B sibling pair, shared de novo mutations present in both were subclonal in R4 in a proportion comparable to the other samples (7/15 by WGS, 46.7%), while the three unique mutations could not be detected in R4 by the PCR approach, indicating again that the unique mutations, which amounted to approximately 10% of total clonal filial mutations in these two animals, arose late during germ cell regeneration."

      "during germ cell regeneration." the expression is too vague to know which stage you are referring to. In relation to minor comment 4, why not create a new chart to clearly show when the expected mutations occurred?<br /> 6. p12, Results:<br /> "Altogether 7/30 regenerant mutations were detected in PR animals, and these included those with the highest AF in the regenerants (Fig. 6C). This suggests that parental animals, even before regeneration, contained a diverse set of stem cells, and some of the detected de novo mutations in the filial generation resulted from the expansion of mutation-containing stem cell clones contributing also to germ cells in the regenerant animals."

      If the mutation in the offspring is derived from the parent (PR) prior to the time of tail amputation, wouldn't it be wouldn't it be strange to assume that it is a de novo mutation?<br /> 7. p12, Results:<br /> "The remaining 23/30 R- subclonal mutations may have arisen during regeneration. On average, ~250 dividing neoblasts were detected in cut tails of animals from the same population as the sequenced individuals, as determined by immunofluorescence of phosphorylated H3 histone (Fig 6D-E). However, the high proportions of body cells carrying regenerant-specific mutations suggest that certain stem cells contribute to disproportionately large parts of the regenerated body, including the germline."

      I did not quite understand the relevance of this discussion to the photos shown here of the M period (Fig. 6e).

      Significance

      General assessment: This paper contains important biological information. The finding that mutations in planarian stem cells cause diversity in the next generation of parthenogenesis is very interesting. However, I think that the author needs to carefully explain and change his argument, for example, that the mutations were caused by regeneration, which does not naturally occur in the species used.

      Advance: The finding that accumulation of mutations is occurring in planarian stem cells has already been reported in Dugesia japonica. Please cite the papers and clarify what is the key finding in this paper.

      Audience: Basic Research_Evolutionary Ecology, Developmental Biology (Stem Cells), Reproductive Biology

      Please define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      My field of study is reproductive biology. I am familiar with the transcriptome but unfamiliar with genome analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for carefully reading our manuscript and for all considerations, ideas, suggestions, and comments. These were all very helpful for us to strengthen the scientific statements of our manuscript. Please, note that all changes are marked in red in the manuscript and supplement. Below you will find, point by point, our responses to all questions and comments.

      Reviewer #1 (Evidence, reproducibility and clarity):

      Overall, this is an exciting work. There are, however, several open questions that the authors could address to facilitate understanding of their work. These points are:

      1.) On page 5, lines 113ff, the authors mention the membrane bulges that they analyse in figure 1. They show these deformations by light (confocal) and electron microscopy. However, the bulges seen by confocal microscopy seem to be bigger that those seen by electron microscopy. The authors could quantify the sizes of the bulges for clarification.

      We quantified the size of the membrane bulges. At the confocal we measured in average 750nm as mean value of identified bulges (n=12) with 650nm as minimal and 890nm as maximal sizes. At the TEM we measure ~243nm as mean value (n=61), with a range between 62nm as minimum and 442 as maximum value. These measurements are shown as Figure 1E.

      Please note that measurements of TEM images do not always capture the three-dimensionality of bulges and may show only parts of them. In addition, ultrastructure is more sensitive and can easily detect small membrane changes that we cannot observe with confocal and airsycan microscopy. In contrast, even with our high-quality objective (63x Zeiss Plan Neofluar, Glycerin, 1.3 NA), standard confocal analysis is limited at ~200nm on the XY axis (airyscan ~110nm) and ~450nm on Z-axis. Therefore, TEM analysis detects smaller bulges than confocal analysis, and consequently, this method detected a large range of bulge lengths between 63nm and 441nm. In contrast, the airyscan method detected a range of bulge length between 0.65 and 0.83 µm. However, confocal and TEM analyses provide evidence of membrane bulges in pio mutant embryos. Please note that we extended our studies and now show membrane bulges in two different pio mutant alleles (17C and 5M) with airsycan microscopy.

      2.) The subject of the manuscript is rather complicated; presentation of data from Figure 1C and D on lines 113ff and 169ff is confusing.

      We apologize and thank the reviewer for careful reading. We revised both paragraphs (lines 108 – 123 and lines 166 - 174) and are confident that the descriptions are now much more understandable. All changes are marked in red.

      3.) The quality of the sub-images of Figure 2E differs. Especially, the phenotype of the wurst, pio transheterozygous embryo is not well visible.

      We apologize for it. We repeated the experiment with wurst;pio transheterozygotes, and generated wurst;pio double mutant embryos to improve the quality. The gas filling assay is shown in Fig. 3. With brightfield microscopy in overview images (10x air objective) and close-ups of the dorsal trunks (25x Glycerin objectives). Both show the gas-filling defects of dorsal trunk tubes. In a subsequent confocal analysis of chitin stainings in late-stage 17 embryos, we found that tracheal tube lumens are collapsed in the transheterozygotes and double mutant embryos.

      4.) Lines 246ff: the protein size are given for the mCherry:chimeric proteins; an estimate of the native Pio portions should be given.

      The endogenous Pio protein has a calculated mass of about 50.82 kDa. We state it now in the according legend of Fig. 6.

      5.) In Figure 6A, the appearance of chitin in the wildtype tube is different compared to the Np mutant situation, more filamentous. Can the authors comment on that?

      The author is correct. The chitin cable formation in Np mutant embryos is normal but lacks the condensation process, and, therefore, fiber structure of the chitin matrix differs from control embryos in late stage 16 and stage 17 embryos (see Drees et al., PLOS Genetics, 2019).

      6.) In the discussion section, I would appreciate if the timing of events was discussed or even shown in a model. The central question is: how are the functions of Pio and Np coordinated in time? As I understand, Np should not cleave Pio before morphogenesis is completed. Is there any example in the literature for how such an interaction could be controlled? The overexpression of Np shows that either the ratio between Np and Pio is important, or the btl promoter expresses Np at the "wrong" time point.

      We thank the reviewer for this interesting comment.

      Of course, we did not measure forces, but it has been published that axial forces appear at the apical cell membrane during stage 16 tube expansion. Our data show that Np cleaves Pio ZP domain and subsequent release increase during stage 16. The cleaved and released Pio enriches in the lumen during stage 16, from where cleaved Pio is internalized during stage 17 with the help of Wurst-mediated endocytosis. This is supported by several in vivo studies, video microscopy, antibody stainings and biochemical data, such as the interaction of Pio and Dumpy as well as the identification of different Pio products with and without Np cleavage. Moreover, we found membrane bulges that increase in size during stage 16 and identified a subsequent tear-off of the chitin matrix in Np mutant embryos. Thus, we propose that Np is required to cleave Pio-Dumpy linkages at the membrane-matrix when tubes elongate and postulated forces appear at the cell membrane during tube elongation in stage 16 embryos.

      We stated this in the discussion as follows:

      “The membrane defects observed in both Pio and Np mutants indicate errors in the coupling of the membrane matrix due to the involvement of Pio (Figs. 1,7). ..., the large membrane bulges in Np mutants affect the membrane and the apical matrix (Fig. 7). Since apical Pio is not cleaved in Np mutants (Fig. 7D), the matrix is not uncoupled from the membrane as in pio mutant embryos but is likely more intensely coupled, which leads to tearing of the matrix axially along the membrane bulges (Figs. 7, 9), when the tube expands in length.”

      How could Np be regulated at the membrane? Np is a zymogen that very likely undergoes ectodomain shedding for activation, similar to what has been described for matriptases. Additionally, human matriptase requires transient interaction of the stem region with its cognate inhibitor HAI-2, which Drosophila lacks (see Drees et al., PLOS Gen, 2019). Thus, the regulation of Np activation is not known.

      Further, we observed that Dumpy is not degraded in Np mutant embryos during stage 17. Nevertheless, in a previous publication, we showed that btl-G4 driven Np expression rescues Np mutant phenotypes in a time-specific manner. We used the btl-G4 driver line for these rescue experiments to express Np in tracheal cells. This restored tracheal Dumpy degradation in Np mutant embryos. Thus, btlG-G4 driven Np overexpression is able to rescue Np mutant tracheal phenotypes in a time-specific manner, although Gal4 is expressed from early tracheal development onwards. Further, btl-Gal4 driven Np expression mimics the endogenous Np, which is expressed from stage 11 onwards in all tracheal cells throughout embryogenesis (see Drees et al., PLOS Gen, 2019).

      Based on these experiments, we conclude that the btl-G4-driven Np overexpression can cleave Pio ZP domain in stage 16 embryos at the correct time.

      However, the ratio of Np expression and Pio is essential in the way that btl-Gal4 driven Gal4 Np overexpression may cause cleavage of a higher number of Pio proteins and the release of critical Pio-Dumpy linkages at the cell membrane and matrix. Thus, increased Pio shedding into the lumen reduces Pio linkages at the membrane, resulting in a pio mutant like tracheal overexpansion in btl-Gal4 driven Gal4 Np overexpression.

      Finally, we were able to prove the reviewer’s question in a new experiment. We used btl-Gal4 driven UAS-Np embryos for Pio antibody staining. This revealed Pio enrichment at the tracheal chitin cable in stage 14 and 15 embryos. In contrast, stage 16 embryos showed numerous Pio puncta appearing across the entire tube lumen, indicating that Np mediates Pio shedding specifically in stage 16 embryos and not before. This Np-controlled Pio releases modifies tube length control.

      Therefore, we stated this in the manuscript as follows:

      Results:

      “Our data assumes that Np overexpression may enhance Pio shedding in stage 16 embryos, affecting the Pio-mediated ZP matrix function. Upon breathless (btl)-Gal4-mediated expression of UAS-Np in tracheal cells, we observed a high amount of Pio puncta across the entire tracheal tube lumen, specifically in stage 16 embryos but not in earlier stages (Fig. S13). Consistently tracheal Np overexpression led to tube overexpansion in stage 16 embryos resembling the pio mutant phenotype (Fig. 8A,B). Thus, Np-mediated Pio shedding controls Pio function.”

      Discussion:

      “The btl-Gal4-driven Np expression mimics the endogenous Np from stage 11 onwards in all tracheal cells throughout embryogenesis (Drees et al., 2019), suggesting that Np is not expressed at a wrong time point. However, the ratio between Np and Pio is essential. We assume that Np overexpression increases Pio shedding, resulting in a pio loss-of-function phenotype. Thus, the tube length overexpansion upon Np overexpression indicates that Pio cleavage is required for tube length control.

      Our observation that the membrane deformations are maintained in Np mutant embryos supports our postulated Np function to redistribute and deregulate membrane-matrix associations in stage 16 embryos when tracheal tube length expands. In contrast, Np overexpression potentially uncouples the Pio-Dpy ZP matrix membrane linkages resulting very likely in unbalanced forces causing sinusoidal tubes.”

      7.) Also for the discussion: We have two situations where Pio amounts/density are enhanced at the apical plasma membrane. The wurst experiments on lines 136ff show that Pio amount and density depends on endocytosis; is the wurst phenotype (Figure 2), at least partially, due to over-presentation of Pio? Likewise, in Figure 2C, there is more Pio in Cht2 overexpressing tracheae (but there is overall more Pio in these tracheae) - is actually endocytosis reduced in chitin-less luminal matrices? First: does the Pio signal at the apical plasma membrane correspond to membrane-Pio or free-Pio? Second, as in the case of wurst: would more Pio on the membrane (density) affect tracheal dimensions in Cht2 over expressing tracheae? Or are the consequences of Pio accumulation in the apical plasma membrane different in Cht2 and wurst backgrounds? Maybe cleavage of Pio and its endocytosis are dependent on its interaction with the chitin matrix. These questions connect to the question immediately above: how are the functions of the different players coordinated in space and time? We need a discussion on this issue.

      We thank the reviewer for this very important idea to discuss the functions of the different players in a coordinated space and time and apologize that we haven’t done before.

      As this is an important point, we tried to figure out all questions raised by the reviewer and discussed it in several new paragraphs in the discussion:

      "Indeed, the anti-Pio antibody, which can detect all different Pio variants, showed a punctuate Pio pattern overlapping with the apical cell membrane marker Uif at the dorsal trunk cells of stage 16 embryos. Additionally, Pio antibody also revealed early tracheal expression from embryonic stage 11 onwards, and due to Pio function in narrow dorsal and ventral branches, strong luminal Pio staining is detectable from early stage 14 until stage 17, when airway protein clearance removes luminal contents.

      We generated mCherry::Pio as a tool for in vivo Pio expression and localization pattern analysis during tube lumen length expansion. The mCherry::Pio resembled the Pio antibody expression pattern from early tracheal development onwards. However, luminal mCherry::Pio enrichment occurs specifically during stage 16, when tubes expand. The stage 16 embryos showed mCherry::Pio puncta accumulating apically in dorsal trunk cells. Moreover, mCherry::Pio puncta partially overlapped with Dpy::YFP and chitin at the taenidial folds, forming at apical cell membranes. Supported by several observations, such as antibody staining, Video monitoring, FRAP experiments, and Western Blot studies (Figs. 4,5), these findings indicate that Pio may play a significant role at the apical cell membrane and matrix in dorsal trunk cells of stage 16 embryos.

      Furthermore, we show that Np-mediates Pio ZP domain cleavage for luminal release of the short Pio variant during ongoing tube length expansion. The luminal cleaved mCherry::Pio is enriched at the end of stage 16 and finally internalized by the subsequent airway clearance process during stage 17 after tube length expansion. Such rapid luminal Pio internalization is consistent with a sharp pulse of endocytosis rapidly internalizing the luminal contents during stage 17 (Tsarouhas et al., 2007). Wurst is required to mediate the internalization of proteins in the airways (Behr et al., 2007; Stümpges and Behr, 2011). In consistence, during stage 17, luminal Pio antibody staining fades in control embryos but not in Wurst deficient embryos.

      Nevertheless, Pio and its endocytosis depend on its interaction with the chitin matrix and the Np-mediated cleavage. In stage 16 wurst and mega mutant embryos, we detect Pio antibody staining at the chitin cable, suggesting that Pio is cleaved and released into the dorsal trunk tube lumen. Also, the Cht2 overexpression did not prevent the luminal release of Pio. However, reduced wurst, mega function, and Cht2 overexpression caused an enrichment of punctuate Pio staining at the apical cell membrane and matrix (Figs. 1,2). Although the three proteins are involved in different subcellular requirements, they all contribute to the determination of tube size by affecting either the apical cell membrane or the formation of a well-structured apical extracellular chitin matrix, indicating that changes at the apical cell membrane and matrix in stage 16 embryos affect the Pio pattern at the membrane. It also shows that local Pio linkages at the cell membrane and matrix are still cleaved by the Np function for luminal Pio release, which explains why those mutant embryos do not show pio mutant-like membrane deformations and Np-mutant-like bulges. This is in line with our observations that tracheal Pio overexpression cannot cause tube size defects as the Np function is sufficient to organize local Pio linkages at the membrane and matrix. Therefore, it is unlikely that tracheal tube length defects in wurst and mega mutants as well as in Cht2 misexpression embryos are caused the apical Pio density enrichment.

      Nevertheless, oversized tube length due to the misregulation of the apical cell membrane and adjacent chitin matrix may cause changes to local Pio set linkages and the need for Np-mediated cleavage. Strikingly, we observe a lack of Pio release in Np mutants. This shows that Pio density at the membrane versus lumen depends predominantly on Np function. The molecular mechanisms that coordinate the Np-mediated Pio cleavage are unknown and will be necessary for understanding how tubes resist forces that impact cell membranes and matrices. On the other hand, Pio is required for the extracellular secretion of its interaction partner Dpy. At the same time, Dpy is needed for Pio localization at the cell membrane and its distribution into the tube lumen. Consistently, in vivo, mCherry::Pio and Dpy::eYFP localization patterns overlap at the apical cell surface and within the tube lumen. These observations support our model that Pio and Dpy interact at the cell surface where Np-mediates Pio cleavage to support luminal Pio release by the large and stretchable matrix protein Dpy (Fig. 9).

      Taenidial organization prevents the collapse of the tracheal tube. Therefore, cortical (apical) actin organizes into parallel-running bundles that proceed to the onset of cuticle secretion and correspond precisely to the cuticle's taenidial folds (Matusek et al., 2006; Öztürk-Çolak et al., 2016). Mutant larvae of the F-actin nucleator formin DAAM show mosaic taenidial fold patterns, indicating a failure of alignment with each other and along the tracheal tubes (Matusek et al., 2006). In contrast, pio mutant dorsal tracheal trunks contained increased ring spacing (Fig. 3A). Fusion cells are narrow doughnut-shaped cells where actin accumulates into a spotted pattern. Formins, such as Diaphanous, are essential in organizing the actin cytoskeleton. However, we do not observe dorsal trunk tube fusion defects as found in the presence of the activated diaphanous.

      On the other hand, ectopic expression of DAAM in fusion cells induces changes in apical actin organization but does not cause any phenotypic effects (Matusek et al., 2006). DAAM is associated with the tyrosine kinase Src42A (Nelson et al., 2012), which orients membrane growth in the axial tube dimension (Förster and Luschnig, 2012). The Src42 overexpression elongates tracheal tubes due to flattened axially elongated dorsal trunk cells and AJ remodeling. Although flattened cells and tube overexpansion are similar in pio mutant embryos, we did not observe a mislocalization of AJ components, as found upon constitutive Src42 activation (Förster and Luschnig, 2012). Instead, we detected an unusual stretched appearance of AJs at the fusion cells of pio mutant dorsal trunks, which to our knowledge, has not been observed before and may play a role in regulating axial taenidial fold spacing and tube elongation.

      Self-organizing physical principles govern the regular spacing pattern of the tracheal taenidial folds (Hannezo et al., 2015). The actomyosin cortex and increased actin activity before and turnover at stage 16 drive the regular pattern formation. However, the cell cortex and actomyosin are in frictional contact with a rigid apical ECM. The Src42A mutant embryos contain shortened tube length but increased taenidial fold period pattern due to decreased friction. In contrast, the chitinase synthase mutant kkv1 has tube dilation defects and no regular but an aberrant pearling pattern caused by zero fiction (Hannezo et al., 2015).

      In contrast, pio mutant embryos do not contain tube dilation defects or shortened tubes but increased tube length (Figs. 1; 8; S1). Furthermore, our cbp and antibody stainings reveal the presence of a luminal chitin cable and a solid aECM structure in pio mutant stage 16 embryos (Figs. 8, S1; S6). In addition, apical actin enrichment in tracheal cells of pio mutant embryos appeared wt-like. Nonetheless, pio mutant embryos show an increased taenidial fold period compared with wt, indicating a decreased friction. Thus, we propose that the lack of Pio reduces friction. Reasons might be subtle defects of actomyosin constriction or chitin matrix, which we have not detected in the pio mutant tracheal cells. Further reasons for lower friction might be the loss of Pio set local linkages between apical cortex and aECM in stage 16 embryos, which are modified by Np, as proposed in our model (Fig. 9).

      Heterozygous and homozygous pio mutant embryos generally do not show tubal collapse. However, the loss of Pio and accompanying lack of Dpy secretion in stage 17 pio mutant embryos led to the loss of a Pio/Dpy matrix, impacting the late embryonic maturation and differentiation of a normal chitin matrix at the apical cell surface. TEM images reveal reduced dense chitin matrix material at taenidial folds and misarranged taenidial fold pattern (Figs. 1; S2), suggesting impaired taenidial function prevents tube lumen from collapsing after tube protein clearance. Wurst knockdown and mutant embryos do not show general tube collapse, but luminal chitin fiber organization is disturbed in stage 17 embryos (Behr et al., 2007). Therefore, transheterozygous wurst;pio mutant embryos may combine both defects and suffer from maturation deficits of the chitin/ZP matrix at the apical cell surface and within the tube lumen, which finally causes a high number of embryos with incomplete gas filling due to tube collapse. These maturation deficits are even more dramatic in the wurst;pio double mutants, which show no gas filling.”

      8.) The sentence on line 242ff should be rephrased: "dynamic" and "elastic" are not opposites.

      We thank the reviewer for careful reading. We revised the sentence as follow:

      “Our FRAP data suggest that Pio is the dynamic part of the tracheal ZP-matrix, while the static Dpy modulates mechanical tension within the matrix”

      9.) A central question to me is the amounts and the density of factors in different genetic backgrounds as mentioned above. Is there any mechanism adjusting the amounts or the density of the players according to the size of the apical plasma membrane or the tracheal lumen? Pio seemingly responds to these changes.

      We would like to know the molecular mechanisms that control the density of players at the apical membrane. This question is important and could be the starting point for novel scientific investigations. Mechanisms of protein trafficking, such as exocytosis, recycling and endocytosis regulate delivery and internalization of proteins at the apical cell membrane. Furthermore, protein junctions at the lateral membrane may recognize and therefore may respond to low and high mechanical stresses between cells that appear during tube length expansion. However, we did not observe any hint for misregulation of Pio expression levels in the different mutants which affect endocytosis, SJs and luminal ECM. But we observed a shift of Pio levels between apical cell membrane/matrix and lumen in wurst, mega mutants and Cht2 overexpression. This shift is analyzed with diverse ZEN tools and quantified (Fig. 2D-F; Fig. S4B). As discussed in the new paragraph, this shift is very likely caused by changes at the apical cell membrane and chitin matrix which impact Pio shedding. Moreover, we observe the lack of Pio release in Np mutants. This shows that Pio density at the membrane versus lumen depends predominantly on Np-mediated cleavage. As discussed above, how Np is activated at the apical cell membrane to cleave Pio is not known.

      10.) The connection of Pio and taenidia is mentioned in the results section (page 7) but not discussed.

      We appreciate the careful reading and comments of the reviewer very much. We included the connection of Pio and taenidial in the discussion section as follows:

      “Taenidial organization prevents the collapse of the tracheal tube. Therefore, cortical (apical) actin organizes into parallel-running bundles that proceed to the onset of cuticle secretion and correspond precisely to the cuticle's taenidial folds (Matusek et al., 2006; Öztürk-Çolak et al., 2016). Mutant larvae of the F-actin nucleator formin DAAM show mosaic taenidial fold patterns, indicating a failure of alignment with each other and along the tracheal tubes (Matusek et al., 2006). In contrast, pio mutant dorsal tracheal trunks contained increased ring spacing (Fig. 3A). Fusion cells are narrow doughnut-shaped cells where actin accumulates into a spotted pattern. Formins, such as Diaphanous, are essential in organizing the actin cytoskeleton. However, we do not observe dorsal trunk tube fusion defects as found in the presence of the activated diaphanous.

      On the other hand, ectopic expression of DAAM in fusion cells induces changes in apical actin organization but does not cause any phenotypic effects (Matusek et al., 2006). DAAM is associated with the tyrosine kinase Src42A (Nelson et al., 2012), which orients membrane growth in the axial tube dimension (Förster and Luschnig, 2012). The Src42 overexpression elongates tracheal tubes due to flattened axially elongated dorsal trunk cells and AJ remodeling. Although flattened cells and tube overexpansion are similar in pio mutant embryos, we did not observe a mislocalization of AJ components, as found upon constitutive Src42 activation (Förster and Luschnig, 2012). Instead, we detected an unusual stretched appearance of AJs at the fusion cells of pio mutant dorsal trunks, which to our knowledge, has not been observed before and may play a role in regulating axial taenidial fold spacing and tube elongation.

      Self-organizing physical principles govern the regular spacing pattern of the tracheal taenidial folds (Hannezo et al., 2015). The actomyosin cortex and increased actin activity before and turnover at stage 16 drive the regular pattern formation. However, the cell cortex and actomyosin are in frictional contact with a rigid apical ECM. The Src42A mutant embryos contain shortened tube length but increased taenidial fold period pattern due to decreased friction. In contrast, the chitinase synthase mutant kkv1 has tube dilation defects and no regular but an aberrant pearling pattern caused by zero fiction (Hannezo et al., 2015).

      In contrast, pio mutant embryos do not contain tube dilation defects or shortened tubes but increased tube length (Figs. 1; 8; S1). Furthermore, our cbp and antibody stainings reveal the presence of a luminal chitin cable and a solid aECM structure in pio mutant stage 16 embryos (Figs. 8, S1; S6). In addition, apical actin enrichment in tracheal cells of pio mutant embryos appeared wt-like. Nonetheless, pio mutant embryos show an increased taenidial fold period compared with wt, indicating a decreased friction. Thus, we propose that the lack of Pio reduces friction. Reasons might be subtle defects of actomyosin constriction or chitin matrix, which we have not detected in the pio mutant tracheal cells. Further reasons for lower friction might also be the loss of Pio set local linkages between apical cortex and aECM in stage 16 embryos, which are modified by Np, as proposed in our model (Fig. 9).

      Heterozygous and homozygous pio mutant embryos generally do not show tubal collapse. However, the loss of Pio and accompanying lack of Dpy secretion in stage 17 pio mutant embryos led to the loss of a Pio/Dpy matrix, impacting the late embryonic maturation and differentiation of a normal chitin matrix at the apical cell surface. TEM images reveal reduced dense chitin matrix material at taenidial folds and misarranged taenidial fold pattern (Figs. 1; S2), suggesting impaired taenidial function prevents tube lumen from collapsing after tube protein clearance. Wurst knockdown and mutant embryos do not show general tube collapse, but luminal chitin fiber organization is disturbed in stage 17 embryos (Behr et al., 2007). Therefore, transheterozygous wurst;pio mutant embryos may combine both defects and suffer from maturation deficits of the chitin/ZP matrix at the apical cell surface and within the tube lumen, which finally causes a high number of embryos with incomplete gas filling due to tube collapse. These maturation deficits are even more dramatic in the wurst;pio double mutants, which show no gas filling.”

      11.) Dp remains cytoplasmic in pio mutant background - is the pio mutant phenotype due to defects by lack of Pio AND Dp function? What is the tracheal phenotype of dp mutants?

      It has been discussed that dumpyolvr and pio mutants show similar phenotypes in early tracheal development (Jazwinska, 2003) and it has been discussed that dumpyolvr mutant embryos compromise tube size in combination with shrub mutants. The additional quantifications of the dumpyolvr mutant showed significantly increased tube length (Dong 2014). We used dumpyolvr mutant [In(2L)dpyolvr], an X-ray induced mutation of the dumpy gene locus (Wilkin 2000). dumpyolvr mutant resemble pio null mutant tracheal phenotypes including detached dorsal and ventral branches and oversized tracheal dorsal trunk with curly appearance in late embryos. We included chitin and Uif staining’s of stage 16 dumpy mutant embryos (Fig. S10).

      This data suggest that Pio mutant phenotype is due to a lack of Pio and Dumpy, which would support our model, of Pio and Dumpy protein interaction in the extracellular space of the tube lumen.

      In wt embryos Pio is predominantly in the luminal chitin cable, in contrast in dumpy mutant embryos most Pio is predominantly not at the luminal chitin cable. Less luminal Pio staining in dumpy mutant embryos but Pio accumulation apically shows that Dumpy is required for luminal Pio release in stage 16 embryos. This supports our model that Pio and Dumpy interaction may link membrane and matrix and that this link reacts on mechanical stress during tube expansion by Np-mediated cleavage of Pio and its accompanied luminal release due to linked Dumpy.

      12.) Lines 374ff: the reduced dorsal trunk in Np mutants is not significant; the respective statement should be formulated carefully. If we believe the statistics (no significance), this would mean that attachment of the apical plasma membrane to the luminal chitin via Pio is needed to restrict axial extension; release of Pio is needed for differentiation (taenidia formation, luminal clearance) beyond morphogenesis.

      We agree with the reviewer that the reduction of the dorsal trunks in Np mutant is statistically not significant. However, the mean value is clearly below that of WT. Therefore, we revised our statement as follow: “In Np mutant embryos, tracheal dorsal trunk length shows the tends to be reduced compared to wt embryos.” Further, the btlG4-driven UAS-Np overexpression of Np suggests strong Pio release from the apical membrane and therefore resembles the pio mutant tube length overexpansion (Fig. 8A,B; Fig S13). Thus, our current observations indicate that Np-mediated Pio release at the cell membrane enables precise tube length elongation.

      We thank the referee for discussing that Pio is needed for taenidial fold formation which would fit to our findings in pio null mutant embryos. Pio mutant embryos show the appearance of taenidial folds in stage 16 embryos (airyscan) and stage 17 embryos (TEM images). However, TEM images also show chitin matrix reduction in pio mutant stage 17 embryos. Further, co-stainings of Pio with Crb and Uif, as well as co-stainings of mCherry::Pio with Dpy-GFP and cbp confirms that the Pio localize at the apical cell membrane where taenidial folds form in late stage 16 embryos. Thus, our observations suggest that Pio and Dumpy are required at the apical membrane and matrix to stabilize taenidial folds and tube lumen during 17. This also includes the Np-mediated Pio release at the apical cell membrane. As requested by the referee we summarized Pio function during late tracheal development in our simplified model (see Fig. 9).

      However, it is of note that Np-mediated Pio release increases at late stage 16 (Fig. 5A, 6D; Fig. S13) but is strongly reduced in stage 17 embryos. In contrast, thin taenidial fold are formed at late stage 16 and becomes thicker and form at fusion points during stage 17 and reach their most mature form when the intraluminal chitin cable is cleared (Öztürk-Colak et al., elife, 2016). Thus, the pattern of Pio release and taenidial fold differentiation do not fully match. Moreover, in preliminary experiments we observe Pio antibody staining in stage 17 embryos at the apical cell membrane of dorsal trunks (data not shown). Furthermore, lumen clearance of Obst-A, Knk, Sepr and Verm are not affected in pio mutant embryos, but unknown luminal ECM contents remained (Fig. 1D). Therefore, we will follow this very interesting idea in future experiments.

      Nonetheless, we state in the results that Pio shedding is essential:

      “Our data assumes that Np overexpression may enhance Pio shedding in stage 16 embryos, affecting the Pio-mediated ZP matrix function. Upon breathless (btl)-Gal4-mediated expression of UAS-Np in tracheal cells, we observed a high amount of Pio puncta across the entire tracheal tube lumen, specifically in stage 16 embryos but not in earlier stages (Fig. S13). Consistently tracheal Np overexpression led to tube overexpansion in stage 16 embryos resembling the pio mutant phenotype (Fig. 8A,B). Thus, Np-mediated Pio shedding controls Pio function.”

      13.) Why don't we see the apical Pio signal in Figure 4B?<br />

      The red arrowhead points to apical mCherry::Pio punctuate staining in the Fig. 5B (before 4B) in the close up of the “bleached area” before bleaching and 56min post bleaching. However, in vivo bleaching experiments do not allow additional antibody stainings to detect precisely the apical cell membrane. Further, the Dpy::eYFP marks the tube lumen and the apical cell surface. The latter showed adjacent mCherry::Pio punctuate staining. However, due to bleaching Dpy signal was not detectable in the area.

      14.) The Strep signals in the merges in Figure 7C are not well visible.

      We are not sure which Strep signal the reviewer is referring to in Fig. 7C, which is now Fig. 8C. The top panel shows the Strep signal (right panel) overlapping with GFP in cells that do not express Np or human matriptase. Thus, the TGFB3 ZP domain is not cleaved, and the intracellular GFP and also the extracellular Strep signals are maintained and overlap.

      In contrast, when Np or human matriptase is added, the TGFB3 ZP domain is cleaved and only the intercellular GFP signal is retained, whereas the extracellular Strep signal is released from the cell surface. This explains why the Strep signal is barely detectable in the middle and lower panels of Fig. 8C.

      Reviewer #1 (Significance):

      This work brings together several factors (Pio, Dp, Np, Wst etc) already known to be needed for tracheal morphogenesis and differentiation in the embryo of D. melanogaster. Having worked myself with some of these factors, however, I recognize that the interaction between these factors is novel and very exciting. The experiments strongly indicate a new mechanism of cell-ECM connection that seems to be conserved to some extent (as they provide preliminary data on an example from humans). By integrating the functions of different factors, the work provides ample opportunity for future projects to elucidate this mechanism in detail. Therefore, I expect that it will have a significant impact not only on the field of developmental cell biology but also, due to the conserved proteins involved (ZP proteins, Matriptase), on the field of cell biology of human diseases.

      Reviewer #2 (Evidence, reproducibility and clarity):

      _The figures are clear, and the questions well addressed. However, I find that some of the claims are not completely backed by the data presented and have some suggestions that will hopefully make some points clearer.

      Major comments

      1.) In the abstract and at the end of the introduction the authors claim that they show that Pio, Dpy and Np support the balancing of mechanical stresses during tracheal tube elongation. However, this is not shown in this manuscript, where tension or mechanical stress were not measured and it is therefore speculative._

      As requested by the reviewer, we deleted “support balancing of” at the final sentence of the Introduction. Please, note that we did not use the term balancing of mechanical stresses at the abstract.

      However, we revised the abstract.

      It has been shown previously that forces and mechanical tension rise when apical membrane expands and elastic extracellular matrix, which is anchored to the membrane balances theses forces (Dong et al., 2014). Furthermore, its has been shown that the gigantic and elastic Dumpy protein modulates mechanical tension (Wilkin et al., 2000). Thus, these previous publications state that mechanical tension rise at the apical cell membrane and matrix when tubes expand during stage 16 and that Dpy is part of that molecular process, which we included in the abstract as essential background information.

      “The apical membrane is anchored to the apical extracellular matrix (aECM) and causes expansion forces that elongate the tracheal tubes. The aECM provides a mechanical tension that balances the resulting expansion forces, with Dumpy being an elastic molecule that modulates the mechanical stress on the matrix during tracheal tube expansion.”

      Nonetheless, our results show that Np-mediated Pio cleavage increases during stage 16 as response to tube length expansion which is accompanied by forces as postulated by others (see above). We further observe that the membrane bulges and chitin matrix tear off, when Pio cleavage does not occur in Np mutant embryos. Our data further show that Pio and Dumpy interact and that Pio release is prevented in Dpy mutant embryos. Altogether this suggests that the Np-mediated Pio cleavage responds to tube expansion and requires Dpy for luminal Pio release.

      We therefore claim in the final sentence of the introduction that “…ZP domain proteins Pio and, Dumpy, as well as the protease Np respond to mechanical stresses when tracheal tubes elongate”. The according changes are marked in red.

      2.) The authors state that all pio CRISPR/Cas9 generated mutants display identical tracheal phenotypes, however these data are not shown. Tracheal phenotypes, in particular DT phenotypes, of all mutants generated should be shown in supplementary materials.

      As requested by the reviewer, we included the data in the supplement. The pio5M and pio11R alleles showed embryonic lethality and a 100% gas filling defect resembling the pio17C allele. Additionally, we extended the tracheal analysis with the pio5M allele and identified tube size defects, irregular pattern of taenidial folds and apical membrane deformation, altogether resembling the pio17C allele. These new data are shown in the supplement Fig. S1.

      We clarify this in the results section as follows:

      “The tracheal phenotypes of pio5m are shown in the supplement (Fig. S1B-F). In all other Figures, we show images of the pio17c allele. “

      3.) At stage 16, pio null mutants display DT overelongation phenotypes (Fig. 1). The authors should quantify this phenotype.

      As requested by the reviewer, we quantified the DT overelongation phenotypes for pio5M (Fig. S1). The quantification of pio17C was shown already in Fig. 6B, now Fig 8B.

      4.) The authors analyse Pio distribution under tubular stress, using mega mutants and Chitinase overexpression. Pio localization changes in these genetic backgrounds and this is shown in Figure 2 only in a qualitative manner. The authors should measure Pio localization at the lumen and at the membrane and provide quantitative data.

      As requested by the referee, we measured Pio localization recognized by the anti-Pio antibody at the lumen and at the membrane to provide quantitative data. These are shown in Fig. 2E.

      All images were taken with a Zeiss Airyscan. For statistical analysis we used the the profile tool of the Zeiss ZEN 2.3 black software. This tool allows the measurement and comparison of fluorescence pixel intensities of individual channels. We determined the fluorescent intensities profile across the tube to identify values at apical membrane and tube lumen at minimum 10 different position of DTs (metameres 5 to 6) of two distinct embryos for each genetic background. The maximum values of membranes versus tube lumen were set into ratio and compared between control, mega mutant and Cht2 overexpression. The control embryos showed a ration below 0.4, the Cht2 overexpression a ratio of 1.2 and mega mutants a ratio of about ~0.9. These quantitative data confirm the statement that Pio localization increases at and near the apical cell membrane with respect to the lumen in mega mutants and in Cht2 overexpression embryos.

      5.) Surprisingly and interestingly, wurst;pio transheterozygotes display very strong tracheal defects. The authors say they observe gas filling defects; however it is not clear from figure 2E if this indeed the case. From the panel in the figure, it looks like these embryos suffer from strong tracheal morphogenetic defects. It would be necessary to have a better analysis of these embryos. What is the penetrance of this phenotype. If this is 100% penetrant, one would expect it to be lethal. Therefore, double mutant balanced stocks are not viable? Having analyzed the phenotypes and confirmed which morphogenetic defects the transheterozygote embryos present, how does this genetic interaction fit with the model presented?

      We are thankful to the reviewer for this interesting point of view suggesting that the wurst;pio embryos display tracheal morphogenetic defects. First, our data show that only 11.6% of the wurst;pio transheterozygous embryos completed gas filling and survived until adulthood. In contrast, 88.4% of transheterozygous wurst;pio mutant embryos did not complete gas filling which is now presented in Fig. 3B. The corresponding quantifications is presented in Fig. 3D. Importantly, the 88.4% wurst;pio transheterozygous embryos which show gas filling defects do not hatch as larvae and die.

      As requested, we performed a better morphogenetic analysis, which is presented in Fig. 3C. Analysis of the gas filling defects with light microscopy were repeated with a better objective (Zeiss Apochromat 25x Gly; 0.8 NA). Indeed, this analysis revealed a strongly compromised tube lumen morphology with irregular tube lumen pattern as if tubes twist and bend. This tube lumen deformation was further confirmed with the confocal analysis of chitin staining (cbp). The tube lumen of stage 17 transheterozygous wurst;pio mutant embryos showed irregular lumen pattern with unusual twists and even partially collapsed tubes.

      Furthermore, as asked by the referee, we generated the wurst,pio double mutation. All wurst,pio double mutant embryos lacked gas filling. In a more in-depth analysis of the tube lumen with a high-performance objective we could not identify any normal tube lumen in stage 17 embryos. Instead the double mutant embryos revealed completely collapsed tracheal tubes. This was confirmed by the chitin staining and confocal analysis. All new data are presented in the supplement.

      As shown in our manuscript and in previous publications, neither pio nor wurst mutant embryos affect cell polarity or gross organization of the actin and tubulin cytoskeleton. However, we found that wurst mutant embryos showed irregular apical membrane expansion at tube lumen (Behr et al., 2007; legend Fig. 4), irregular chitin fiber organization and to some extend collapsed tube lumen. In pio mutant embryos we found deformed apical membrane of DTs, irregular pattern of taenidial folds and to some extend collapsed tube lumen. Thus, the apical membrane is their common target of both proteins in late embryonic development, suggesting that pio functions provide stability and wurst functions the internalization of proteins at the apical membrane.

      We discussed it as follows:

      “Nevertheless, Pio and its endocytosis depend on its interaction with the chitin matrix and the Np-mediated cleavage. In stage 16 wurst and mega mutant embryos, we detect Pio antibody staining at the chitin cable, suggesting that Pio is cleaved and released into the dorsal trunk tube lumen. Also, the Cht2 overexpression did not prevent the luminal release of Pio. However, reduced wurst, mega function, and Cht2 overexpression caused an enrichment of punctuate Pio staining at the apical cell membrane and matrix (Figs. 1,2). Although the three proteins are involved in different subcellular requirements, they all contribute to the determination of tube size by affecting either the apical cell membrane or the formation of a well-structured apical extracellular chitin matrix, indicating that changes at the apical cell membrane and matrix in stage 16 embryos affect the Pio pattern at the membrane. It also shows that local Pio linkages at the cell membrane and matrix are still cleaved by the Np function for luminal Pio release, which explains why those mutant embryos do not show pio mutant-like membrane deformations and Np-mutant-like bulges. This is in line with our observations that tracheal Pio overexpression cannot cause tube size defects as the Np function is sufficient to organize local Pio linkages at the membrane and matrix. Therefore, it is unlikely that tracheal tube length defects in wurst and mega mutants as well as in Cht2 misexpression embryos are caused by the apical Pio density enrichment.”

      “Heterozygous and homozygous pio mutant embryos generally do not show tubal collapse. However, the loss of Pio and accompanying lack of Dpy secretion in stage 17 pio mutant embryos led to the loss of a Pio/Dpy matrix, impacting the late embryonic maturation and differentiation of a normal chitin matrix at the apical cell surface. TEM images reveal reduced dense chitin matrix material at taenidial folds and misarranged taenidial fold pattern (Figs. 1; S2), suggesting impaired taenidial function prevents tube lumen from collapsing after tube protein clearance. Wurst knockdown and mutant embryos do not show general tube collapse, but luminal chitin fiber organization is disturbed in stage 17 embryos (Behr et al., 2007). Therefore, transheterozygous wurst;pio mutant embryos may combine both defects and suffer from maturation deficits of the chitin/ZP matrix at the apical cell surface and within the tube lumen, which finally causes a high number of embryos with incomplete gas filling due to tube collapse. These maturation deficits are even more dramatic in the wurst;pio double mutants, which show no gas filling.”

      6.) mCherry::Pio Dpy::eYFP time lapse analysis and FRAP experiments is very interesting. However, it is not clear to which degree bleaching occurs in the tracheal lumen. The authors claim that recovery is very fast and can be seen from minute 2, however, frame-by-frame analysis of Movie S2 does not show a clear different between luminal Pio from minute 0 to minute 2. Rough comparison with the luminal area surrounding the bleached area, does not show a clear difference in luminal Pio before and after photobleaching. To claim fast recovery of luminal Pio after photobleaching, the authors should quantify luminal Pio, before and after bleaching.

      We agree with the reviewer and deleted “fast”. The Video2 shows intracellular mCherry::Pio recovery within 2min after photobleaching. The Video 2 shows extracellular (luminal) recovery within 6min after photobleaching, when first large mCherry::Pio puncta appear at the apical surface of the bleached area. Nonetheless, mCherry::Pio puncta appear in the lumen indicating recovery, whereas Dpy::eYFP did not.

      We state this in the Results section as follows:

      “In stage 16 embryos mCherry::Pio puncta reappeared in tracheal cells within 2 minutes of bleaching and in the tubular lumen within 6 minutes.”

      In addition, in figure 4D, the normalized mCherry::Pio fluorescence in the graph what does it refer to? Intracellular Pio?

      Figure 4D, now 5D, shows Western Blot signals. We guess that you refer to Fig 4B which is Fig. 5B.

      We are sorry for confusion and named it now Fig. 5B’.

      We stated in the Material section:

      “The bleaching was performed with 405nm full laser power (50mW) at the ROI for 20 seconds. A Z-stack covering the whole depth of the tracheal tubes in the ROI were taken at each imaging step. “Fluorescence intensity in the bleached ROIs was measured after correction for embryonic movements using Fiji.”

      Thus, to clarify this point, we added to the legends:

      “Fluorescence intensities refer to the bleached ROIs as indicated with the frame in corresponding Movie S2 and was measured after correction for embryonic movements.”

      7.) When mCherry::Pio Dpy::eYFP time lapse analysis and FRAP experiments was done in an Np mutant background, the authors describe lack of Pio recovery within the lumen (Movie S3). However, when comparing control and Np mutant background embryos, Pio is not properly released into the lumen of Np mutants (as stated by the authors and seen by comparing movies S1 and S4). Furthermore, on minute 0 of the FRAP experiment in Np embryos, there is no detectable Pio in the DT lumen. Therefore, recovery was not expected in Np mutants and should not be claimed as a conclusion for this experiment.

      We thank the reviewer for careful reading and apologize our wrong description. We changed it accordingly as follows:

      “In contrast to the control, extracellular mCherry::Pio is not released into the tube lumen within 56 min after bleaching in Np mutant embryos (Fig. 6C, Video S3).”

      8.) Brodu et al (Dev Cell 2010) have shown that Pio is important for cytoskeletal modulation during tracheal maturation. Pio is important for non-centrosomal microtubule (MT) arrays anchored at the tracheal cell apical membranes. In addition, MT disruption in tracheal cells leads to lumen formation defects (Brodu et al, Dev Cell 2010). In the absence of Pio, the tracheal cytoskeleton is altered, and this could explain some of the results observed. Ideally, the work should be complemented with a basic cytoskeletal analysis, but if this is not possible, the authors should discuss some of the phenotypes in light of this Pio function.

      Dear reviewer, this is a great idea. Therefore, we analyzed F-actin with Phalloidin and beta tubulin (E7 antibody, DSHB) in the dorsal trunk cells of stage 16 control and pio mutant embryos. However, tracheal cells are tiny and only gross irregularities can be realized. So, confocal Z-stack analysis of the stainings did not show gross differences between control and pio mutant embryos. We observe the expected apical subcortical accumulation for the actin and tubulin cytoskeleton in dorsal trunk cells of pio stage 16 mutant embryos which also has been shown for wt embryos elsewhere. These new data are presented in the supplement Fig. S7.

      Minor comments<br /> The model should not be in supplementary materials and should be moved to the main manuscript.

      We thank the reviewer for this suggestion and moved the model to the main part – now Fig.9. As requested by the reviewer 1, we extended the model, showing the timing events of Pio function.

      Throughout the manuscript embryonic stages are described using different nomenclature (stage X, stX and st X). Either way is correct, but the same nomenclature should be used throughout.

      We apologize for the different nomenclature and use "stage X" in the manuscript and "stX" in the figures for space reasons. Legend 1 clarifies the abbreviation.

      In Fig. S1 B and C the authors should specify which pio allele is being analysed (as in Fig. 7). The same should be done in the text.

      That's a fairly good point. To be clear from the beginning, we now state the following in the first paragraph of the results:

      “The tracheal phenotypes of pio5m are shown in the supplement (Fig. S1B-F). In the all other Figures, we show phenotypes of the pio17c allele.”

      Line 131, it is not correct to say that WGA visualizes cell membranes. WGA marks/stains cell membranes.

      Thanks for finding this mistake, it’s now corrected.

      Line 165 "leads to excessive tube dilation and length expansion due to strongly reduced luminal chitin" is not correct. Chitin reduction leads to excessive tube dilation but not to length expansion, as reported in the papers cited at the end of the sentence.

      Thanks very much for careful reading, we deleted “and length expansion” from the sentence.

      Line 220-221, what do authors refer to as "stage 16 wt-like control embryos"?

      Thanks for finding these mistakes. We corrected as follows:

      “In stage 16 embryos mCherry::Pio puncta….”

      Line 221, "some minutes" should be replaced by a specific number of minutes. According to Movie S2 reappearance of tracheal cell Pio happens from minute 16.

      We agree with the reviewer to state the time when mCerry::Pio puncta reappear. We observe first large puncta within two minutes after bleaching in tracheal cells at the ROI (Video S2, lower cell row at the movie). We further observe the reappearance of first large puncta at the ROI within 6 minutes in the tracheal tube lumen.

      We corrected it as follows: “In stage 16 embryos mCherry::Pio puncta reappeared in tracheal cells within 2 minutes of bleaching and in the tubular lumen within 6 minutes.”

      Line 291 "time laps" should be lapse.

      Thanks for finding the typo, it is corrected now.

      Line 302, "Pio was not shedded into the lumen but remained at the cell" should be "Pio was not shed into the lumen but remained in the cell".

      Thanks for finding the typo, it is corrected now.

      _Referees cross-commenting

      I agree. Taken together, all the comments will improve the quality of the work and of a future manuscript. Also, everything seems quite doable and will not present any problems._

      Reviewer #2 (Significance):

      _The findings shown in this manuscript shed light on the regulation of tubulogenesis by ZP proteins and how their interaction with the ECM can be regulated by proteolysis. It was known that Pio is involved in tracheal development, is secreted into the lumen, regulating tube elongation (Jaźwińska et al., Nat.Cell Biol., 2003) and anchoring MTs to the apical membrane during tubulogenesis (Brodu et al, Dev. Cell 2010). This work provides additional molecular insights into Pio dynamics and regulation during tube maturation.<br /> This work will be of interest to a broad cell and developmental biology community as they provide a mechanistic advance in ZP proteins involved in morphogenesis. It is of specific interest to the specialized field of tubulogenesis and tracheal morphogenesis.

      Field of expertise:<br /> Drosophila, morphogenesis, tracheal tubulogenesis, cytoskeleton_

      Reviewer #3 (Evidence, reproducibility and clarity):

      _Summary<br /> In this manuscript, Drees and colleagues analysed, during the formation and growth of tubular systems, how cells combine forces at the cell membranes while maintaining tubular network integrity. A fundamental question is to understand how cells manage to integrate the axial forces to stabilise the cell membrane and the apical extracellular matrix (aECM).<br /> To address this question, the authors study the formation of the tracheal system in Drosophila embryos, a well-established and detailed model system to investigate formation of tubular networks. In particular, they focused on the formation of the larger tube of the tracheal network, the dorsal trunk. The formation of this tube depends in part of axial extension along the antero-posterior axis.<br /> They concentrated their work on the function of Piopio (Pio), a Zona-Pellucida (ZP)-domain protein. They showed that Pio together with the protease Notopleural (Np) contribute the sense and support mechanical stresses when tracheal tubes elongate, thus ensuring normal membrane -aECM morphology.

      Major Comments

      In a previous work, Drees et al. (PLOS Genetics 2019), showed the matriptase-prostasin proteolytic cascade (MPPC), is conserved and essential for both Drosophila ECM morphogenesis and physiology.<br /> The functionally conserved components of the MPPC mediate cleavage of zona pellucida-domain (ZP-domain) proteins, which play crucial roles in organizing apical structures of the ECM in both vertebrates and invertebrates. They showed that ZP-proteins are molecular targets of the conserved MPPC and that cleavage within the ZP-domains is a conserved mechanism of ECM development and differentiation.<br /> Here, Drees et al. investigate further how the coupling between membrane and matrix takes place to ensure proper tube growth.<br /> Pio distribution and phenotypes<br /> They first focused on the tracheal phenotypes observed in a pio null mutant context. So far, the only pio mutant characterised was a point mutation in the ZP domain. Using CRISPR/Cas9, they generated new alleles of pio which are lack of function alleles. In the context, Drees and colleagues observed over-elongated dorsal trunk tubes, with bulges appearing at stage 16 between the apical domain of tracheal cells and adjacent extra-luminal matrix.<br /> Additionally, pio mutant embryos showed impaired tube lumen clearance of the some of the aECM components, which prevent gas-filling of the airways.<br /> To detect Pio distribution, the authors used either anti-Pio antibody directed toward a short stretch with the Pio ZP domain or generated a CRISPR/Cas9 piomCherry::pio line.

      _

      1.) The Pio antibody shows a strong luminal staining as already published. But the authors reported an apical membrane signal in tracheal cells. I find this apical membrane signal really difficult to observe in panel Fig. 2B. The overlap between the Pio dots and the apical membrane labelled with Uif showed in Fig 2C can be due to the 3D projection. It is only when endocytosis is unpaired (Suppl Fig. 2), that data are more convincing.

      We thank the reviewer for this important point, we are sorry for the unconvincing presentation and for having the chance to improve it.

      We show the 3D image of Pio puncta as voxels overlapping with Uif at the apical cell membrane. The amount of Pio voxels overlapping with the Uif marked apical cell membrane increased in mega mutant and due to tracheal Cht2 overexpression. This result was indicated by a representative region (frame) and white arrows and is shown now in Fig. 2C.

      We further used orthogonal projections across the tracheal tube of the airyscan Z-stacks. Random usage confirmed that puncta of Pio antibody staining overlap with Uif at the tube lumen. We observed overlap in controls, but increasing overlap in mega mutant and Cht2 overexpressing embryos. This result is shown now in Fig. 2E.

      However, to overcome any misinterpretations of projections, we used single images of the original airyscan Z-stacks for co-localization analysis with the Zeiss ZEN software (black, 2.3, sp1). We used two available and independent standard methods to compare fluorescence pixel intensities of different channels namely the ZEN co-localization and the ZEN profile tool. Both are described in the Materials section.

      a.) With the co-localization tool we compared directly fluorescence pixel intensities of Pio and Uif. Highest overlap of the intensities, shown in the ZEN tool as third quadrant, were set to white for better visualization in the images. These new images are included as Fig. 2D and show recurrent overlap of Pio and Uif antibody stainings (punctuate pattern) along the apical cell membrane at the dorsal trunk of stage 16 control embryos. This overlap pattern increased in mega mutant and Cht2 overexpression embryos.

      b.) A second approach for comparing fluorescence intensities is the ZEN “profile” tool. Drawing a line across the tube allowed us to compare peak fluorescence pixel intensities of the different channels at distinct regions, such as the apical cell membrane and the tube lumen including the cbp marked chitin cable. This tool detected overlap of peak fluorescence intensities of UIF and Pio antibody staining’s, confirming that Pio is located together with UIF at the apical membrane of dorsal trunk tracheal cells. These new intensity profiles and the corresponding images are presented in the supplement as Fig. S4B-D. Quantifications of this method comparing the ration of Pio peak intensities between the apical cell membrane and the tube lumen are presented as Fig. 2F (as requested by Reviewer 2).

      2.) When the author used their CRISPR/Cas9 piomCherry::pio line to characterise Pio distribution (Fig.4), Pio is localised at the apical plasma membrane before stage 16. Only at stage 16, Pio is detected within the lumen. This timing of Pio release in the lumen is critical for the model proposed by Drees at al. This is an important point to assess the difference between the use of the antibody (which mostly label the lumen) while piomCherry::pio line is mostly at the membrane.

      We agree with the reviewer that the Pio antibody shows a different pattern within the tube lumen of earlier stages. The Pio antibody shows intense extracellular staining from early stage 12 onwards, presumably due to its early function at dorsal and ventral branches, as shown by Anna Jazwinska (Jazwinska et al., 2003). The intense luminal Pio antibody staining, predominantly at the chitin cable, persist until its disappearance due to airway protein clearance during stage 17. Unfortunately, this strong luminal Pio staining made it impossible to examine the Pio distribution pattern in more detail during stage 16. Nevertheless, Np overexpression experiments indicate that luminal Pio release occurs specifically in stage 16 embryos (Fig. S13), which was tested with the Pio antibody, see results, second last paragraph:

      “Our data assumes that Np overexpression may enhance Pio shedding in stage 16 embryos, affecting the Pio-mediated ZP matrix function. Upon breathless (btl)-Gal4-mediated expression of UAS-Np in tracheal cells, we observed a high amount of Pio puncta across the entire tracheal tube lumen, specifically in stage 16 embryos but not in earlier stages (Fig. S13).”

      We further agree with the reviewer that mCherry::Pio was used to characterize in vivo Pio distribution within the dorsal trunk cells and tube lumen during stage 16. The Fig. 5A shows apical mCherry::Pio distribution pattern in early and late stage 16 embryos. Importantly, the appearance of luminal mCherry::Pio increased during stage 16 and mainly enriched at late stage 16. See Figure 5A, red arrowheads point to apical Pio and red arrows to luminal Pio staining.

      Furthermore, as discussed above and shown by different ZEN tools, such as co-localization and fluorescence intensity profile tools, Pio antibody stainings revealed a punctuate pattern at the apical cell membrane of dorsal trunk cells in stage 16 embryos, which is reflected also by the appearance of apical mCherry::Pio puncta at the membrane surface. Additionally, we observed mCherry::Pio puncta also within the tube lumen (see the new Figures S4B & S8). Thus, subcellular Pio distribution at the apical cell membrane and lumen were observed for both, Pio antibody staining and mCherry::pio pattern.

      Nonetheless, there is different luminal appearance between the Pio antibody staining and mCherry::Pio. Pio antibody detects a short stretch at the ZP domain and thus detects all possible Pio variants, uncleaved and cleaved. Due to early tracheal Pio function, Pio enriches within the tube lumen in an intense core-like structure, which is recognized by the Pio antibody and is comparable with the Dpy::eYFP pattern. Also mCherry::Pio labels all Pio variants, uncleaved and cleaved. The spatial temporal mCherry::Pio expression pattern (Fig. S5) is comparable with the Pio antibody pattern and the staining at the membrane in stage 16 embryos. However, mCherry::Pio did not enrich in the lumen in a core-like structure, nonetheless, shows overlap with luminal Dpy::eYFP.

      Jaswinska showed that Pio antibody staining is intracellular in the trachea of stage 11 pio2R-16 point mutation embryos (Jaswinska et al., 2003; Fig 2d). To understand more about the specificity of the antibody, we performed stainings in the null mutant embryos. In contrast, to the high number of intracellular Pio puncta in pio2R-16 point mutation embryos, Pio stainings were much more reduced in pio5m and pio17c mutants, but a low number of Pio puncta were still detectable in the embryos (Fig. S1G,H). It is of note that also dpy mutants showed strongly reduced Pio antibody staining (Fig. S10E). Thus, discussing underlying causes of enriched (Pio antibody) versus non-enriched (mCherry::Pio) luminal staining are speculative. However, observations by Jaswinska et al. (2003) and our new observations, investigating the Pio antibody stainings in pio null mutants, dpy mutants, eYFP::Dpy embryos and NP overexpression may hint to the possibility of cross-reactivity of the Pio antibody to other ZP domains which may intensify the appearance of luminal Pio antibody staining in control embryos.

      Anyway, we clarify the difference in luminal Pio pattern in the discussion as follows:

      “Indeed, the anti-Pio antibody, which detects all different Pio variants, showed a punctuate Pio pattern overlapping with the apical cell membrane markers Crb and Uif at the dorsal trunk cells of stage 16 embryos (Fig. 2; Fig. S3,S4). Additionally, Pio antibody also revealed early tracheal expression from embryonic stage 11 onwards, and due to Pio function in narrow dorsal and ventral branches, strong luminal Pio antibody staining is detectable from early stage 14 until stage 17, when airway protein clearance removes luminal contents. In the pio5m and pio17c mutants Pio stainings were strongly reduced although some puncta were still detectable in the trachea (Fig. S1G,H). Similarly, Pio antibody staining is intracellular in the trachea of stage 11 pio2R-16 point mutation embryos (Jaźwińska et al., 2003). Interestingly, also dpy mutants showed strongly reduced and intracellular Pio antibody staining (Fig. S10E).

      We generated mCherry::Pio as a tool for in vivo Pio expression and localization pattern analysis during tube lumen length expansion. The mCherry::Pio resembled the Pio antibody expression pattern from early tracheal development onwards. However, luminal mCherry::Pio enrichment occurs specifically during stage 16, when tubes expand. The stage 16 embryos showed mCherry::Pio puncta accumulating apically in dorsal trunk cells. Moreover, mCherry::Pio puncta partially overlapped with Dpy::YFP and chitin at the taenidial folds, forming at apical cell membranes. Supported by several observations, such as antibody staining, Video monitoring, FRAP experiments, and Western Blot studies (Figs. 4,5), these findings indicate that Pio may play a significant role at the apical cell membrane and matrix in dorsal trunk cells of stage 16 embryos.”

      3.) Another important point is to explain the discrepancy between the pio mutant alleles. The allele containing a point mutation in the ZP domain shows no over-elongated tubes (Dong et al 2014, Jazwinska et al. 2003) while the lack of function alleles does.

      The reviewer is correct that the pio2R-16 mutation shows only a disintegration phenotype whereas our pio null mutations show in addition tube length defects. However, Dong et al. showed significantly increased dorsal trunk length in shrub; pio2R-16 double mutant embryos when compared with shrub mutant embryos (Supplemental Fig. S4A). Also, the shrub;dpyolvR double mutant embryos revealed increased tube length expansion when compared with shrub mutant embryos. Moreover, their quantifications show that the also dpyolvR mutant embryos revealed significantly increased tube expansion when compared with wt. Altogether these previous findings suggests that Pio and Dpy are involved in controlling tube length control during stage 16.

      Furthermore, we generated three independent pio null mutation alleles, which lost all the essential Pio protein domains, and caused all embryonic lethality, gas-filling defects, branch disintegration phenotype and tube length defects (quantifications are shown in Figs. 9 and S1). In addition, pio null mutations prevent Dpy::eYFP secretion. Thus, we are confident that the observed tube length defects as well as the air-filling defects are due to the loss of Pio, and in particular since these defects could be rescued by Pio Expression in the pio null mutation background, as shown in Fig. 3B.

      So, what could make the difference?

      The described pio2R-16 mutation allele contains a X-ray induced single point mutation that led to an amino acid replacement (V159D) in the ZP domain. It is not clear how the amino acid exchange affects the protein and the ZP domain. It may hamper pio function and maybe this amino acid replacement is problematic for the early tracheal function but not during stage 16. As stated by Jazwinska et al. 2003 (Fig. 2 legend), Pio antibody staining is intracellular in the mutants and extracellular in the trachea of wt at stage 13.

      They further speculate that the mutant Pio protein may retain in the secretory pathway, but this is not confirmed with co-markers. As luminal Pio function is required to provide a barrier for autocellular AJ formation, this fails in pio2R-16 mutation. In contrast, it is still possible that Pio interacts and supports Dpy secretion in pio2R-16 mutation and additionally it is thinkable that intracellular Pio may reach to some extend the apical cell membrane in pio2R-16 mutation stage 16 and thus can support tube size control. But these assumptions are speculations.

      Nevertheless, to clarify this point we explain the discrepancy between the pio2R-16 mutation and pio null mutations alleles as follows:

      “Using CRISPR/Cas9, we generated three pio lack of function alleles (Fig. S1A), all exhibiting embryonic lethality and identical tracheal mutant phenotypes. The tracheal phenotypes of pio5m are shown in the supplement (Fig. S1B-F). In all other Figures, we show images of the pio17c allele. The pio17c and pio5m null mutant embryos revealed the dorsal and ventral branch disintegration phenotype known from a previously described pio2R-16 mutation allele which contains a X-ray induced single point mutation that led to an amino acid replacement (V159D) in the ZP domain (Jaźwińska et al., 2003). Additionally, the late stage 16 pio17c and pio5m null mutant embryos showed over-elongated tracheal dorsal trunk tubes (see below).”

      4.) A minor point, the author should provide hypothesis to explain why only the clearance of CBP, Obstructor-A and Knickkopf are affected in a pio mutant background and not Serpentine and Vermiform.

      We thank the reviewer for careful reading and the comment on this point. We would be happy to see such a scenario which could give us a hind of Pio interaction partners at the chitinous matrix. However, we stated that luminal material, such as Obst-A and Knk are removed from the lumen (see Fig. S5A). We further describe that in pio mutant embryos, luminal Serp and Verm staining appeared reduced but showed wt-like distribution (see Fig. S6) in stage 16 embryos. We do not show Serp and Verm in stage 17 embryos, but they are removed from the tube lumen (not shown). These data are received from immune-staining’s and confocal analysis.

      Nevertheless, we also state that pio mutant embryos revealed lumen clearance defects in TEM analysis, of undefined material in the tube lumen (see Fig. 1D and Fig. S2B).

      To clarify this point we state in the results as follows:

      “Fourth, ultrastructure TEM images revealed aECM remnants in the airway lumen of pio mutant stage 17 embryos, while control embryos cleared their airways (Fig. S2B). Consistently, the in vivo analysis of airways in stage 17 pio mutant embryos revealed lack of tracheal air-filling (Fig 3B). The pan-tracheal expression of Pio in pio mutant embryos rescued the lack of gas filling (Fig 3B). Thus, TEM images suggest that pio mutant embryos showed impaired tube lumen clearance of aECM, which prevented subsequent airway gas-filling. “

      And

      “Also, the pio mutant embryos showed tracheal lumen clearance defects of chitin fibers in ultrastructure (TEM) analysis (Figs. 1D, S2B). In contrast, confocal analysis revealed that well-known chitin matrix proteins, such as Obstructor-A (Obst-A) and Knickkopf (Knk), are removed from the lumen of pio mutants (Fig. S5A). These results suggest that the Pio function did not affect airway clearance of Obst-A and Knk and therefore did not play a central role in airway clearance like Wurst. Nevertheless, airway clearance defects observed in TEM images in pio null mutant embryos and, in addition, defective tube lumen morphology in wurst;pio transheterozygous mutant embryos explain the occurrence of airway gas filling defects.”

      5.) Pio and Dumpy. Dumpy (Dpy) is another ZP domain protein secreted by the tracheal cells and detected in the lumen. To follow Dpy distribution, Drees and colleagues used a Dpy::eYFP protein trap line, the same used in Dong et al. However, in this latter paper, Dong et al. stated, using a Crb staining, that Dpy is not at the apical cell surface but only in the lumen. However, Drees and colleagues reported (line 227 and Fig. 4C) that Dpy appears both at the apical cell surface and in the lumen of the tracheal system. But they did not show a co-localisation with an apical marker. Furthermore, in their previous work, (Drees et al. 2019) they called the apical staining a "peripheral shell" layer. In addition, in S2R+ cell culture, it is only when Pio and Dpy co-express that Dpy is detected at the cell membrane. The in vivo localisation of Dpy is an important point that needs to be clarified as it is of importance for the final model proposed Supp Fig. 9.<br /> Drees at al. also performed FRAP experiments on Dpy::eYFP protein trap embryos. As excepted as already shown by Dong et al.

      The referee is correct, we state “In stage 16 embryos Dpy::eYFP (Lye et al., 2014) appears at the tracheal apical cell surface and predominantly within the lumen (Fig. 4C).” The corresponding Fig. 4C reveals Dumpy::eYFP staining overlapping with chitin at two subcellular regions: Dpy is enriched as a core-like structure within the lumen overlapping with the chitin cable of the control embryos. Additionally, Dpy::eYFP overlaps with the chitin part that might be part of the apical cell surface. But this observation is hard to see in images in Fig. 4C and we apologize it. We therefore repeated the Dpy::eYFP localization analysis and analyzed in more detail with the ZEN profile tools, which shows peak fluorescence pixel intensities of different channels and provides the possibility to prove, if they overlap in XY axis.

      We asked first, if cbp (chitin) appears at the apical surface of dorsal trunk cells, when Pio becomes cleaved and released. In mid stage 16 embryos cbp staining appeared in the luminal chitin cable and additionally in a distinctive pattern, which fits to the pattern of taenidial folds that start to form. We therefore used the apical cell membrane marker Crumbs to co-stain cbp. Airycsan microscopy fluorescence intensity profile analysis and corresponding close ups images confirmed the overlap of Crb and cbp stainings at this distinctive pattern indicating this shows the chitin matrix at the apical cell surface (Fig. S8A). But there was no overlap of cbp and Crb at the chitin cable structure. Thus, knowing the localization of the apical cell surface chitin matrix, we performed co-stainings of cbp with mCherry::Pio (RFP antibody). This revealed, as expected, overlap of cbp and RFP antibody staining at the apical cell surface chitin matrix (distinct pattern) and with the luminal chitin-cable (Fig. S8B,C). Finally we repeated the stainings and analysis with cbp, mCherry::Pio (RFP antibody) and Dpy::eYFP (GFP antibody). First, these results revealed overlap of Dpy::eYFP and cbp at the apical cell surface and in the tube lumen (Fig. S8D) and second, overlap of punctuate staining of Dpy::eYFP, cbp and mCherry::Pio at the apical cell surface chitin matrix and also at the luminal chitin cable (Fig. S8E).

      Very obvious from images and Z-projection in Fig. 4C is the lack of extracellular Dpy::eYFP staining in pio mutant embryos. Dpy::eYFP enriched intracellularly, and thus, the pio mutant caused Dpy::eYFP mis-expression fits well to our results from S2R+ cell culture. As the reviewer notes, it is only when Pio and Dpy co-express that Dpy is detected at the cell membrane.

      Altogether, Fig. 4C, cell culture experiments and our new stainings support our model, that Pio and Dumpy interact and are co-secreted at the apical cell membrane/surface, where Np mediates Pio cleavage. As requested by reviewer 2, we moved the model to Fig. 9. As requested by reviewer 1, we extended the model for timing events.

      A minor point, the Dpy::eYFP protein trap line used in this study is not listed in the Materials and Methods section of the supplementary data.

      Thanks, we included it into the List of sources (Supplement). This YFP-trap line (called CPTI lines) was published by Claire M. Lye et al., Development, 141, 2014. We cite it in our manuscript.

      6.) The serine protease NP and Pio release. Drees and colleagues have pervious shown, preforming in vitro studies, that protease Notopleural (Np) cleaves the Pio ZP domain (Drees at al. 2019). Here the authors went a step further in demonstrating that it is also true in vivo at stage 17. In addition, they showed that, in Np mutant embryos, mCherry::Pio is mostly detected within tracheal cells and the luminal staining is strongly reduced. In this mutant context, the authors conducted FRAP experiment on the mCherry::Pio signal even very weak in the lumen. They showed hardly no recovery after photobleaching.<br /> In Drosophila S2 cells, Drees and colleagues showed that co-expression of the catalytically inactive NpS990A with mCherry::Pio in showed as a prominent signal the 90kDa mCherry::Pio variant in the cell lysate (Fig. 5B), and live imaging revealed mCherry::Pio localisation at the cell surface (Fig. S6B). However, in this inactive form context, a strong signal is also detected at 60kDA corresponding to a cleaved form of the Pio ZP domain (Fig. 5B), and Pio localisation at the cell surface appears weaker than in controls. They authors did not consider that another protease could be at play.<br /> On the other hand, in their previous work, Drees et al. identified a mutant form of Pio (PioR196A) which is resistant to NP cleavage in vitro. It will be a step forward to establish by CRISPR/cas9, as the authors seems to be successful with this technique, a mutant line carrying this point mutation. It will be important to determine whether the observed phenotype resembles that of a mutant Np phenotype.<br /> In their previous work (PLOS Genetics 2019), in Np mutant embryos, Drees et al. did not report "budge-like" deformations from stage 16 onwards leading to the detachment of the tracheal cell from their adjacent aECM. Either the alleles or the allelic combination is different between the two studies which could explain this difference, or it is a new phenotype that has not been previously described. In the latter case, it becomes important to quantify the proportion of segments showing these bubbles. Is this a rare phenotype to observe?

      We thank the reviewer for the very interesting comments and the careful reading of our manuscripts and the very useful suggestions. We agree, the we cannot exclude the possibility that another protease is involved in the cleavage of Pio. Therefore, we included this important point in the discussion section as follows:

      “Unknown proteases may likely be involved in Pio processing since cleaved mCherry::Pio is also detectable in inactive NpS990A cells.”

      We think the generation of the pioR196A mutant to address Pio localization and tracheal phenotypes is a great idea, which we would like to address in future experiments. Unfortunately, the production of this fly line with such a specific point mutation at this position will take several months, not included the subsequent evaluation and phenotypic analysis of this fly line and mutants. Therefore, we apologize that we cannot pursue this question experimentally. Nevertheless, mentioning the possibility and the requirement of such an experiment is important and we discuss it as follows:

      “Previously we identified a mutation at the Pio ZP domain (R196A) resistant to NP cleavage in cell culture experiments (Drees et al., 2019). Establishing a corresponding mutant fly line would be essential in determining whether the observed phenotype resembles the phenotype of the Np mutant embryos.”

      However, knowing that we are not able to provide a new mutant fly line to evaluate the formation of the dorsal tube when an NP non-cleavable form of Pio is expressed, we sought to use an alternative approach by overexpressing Np in the trachea with btl-Gal4. This shows a clear pairing of Np overexpression and Pio release specifically at stage 16 dorsal trunk and associated tube overexpansion.

      Finally, the reviewer is correct, we did not mention the appearance of bulges in Np mutant tracheal dorsal trunk cells in our previous publication. We used that same Np alleles in 2019 and a closer look at the publication of 2019 likewise shows the appearance of bulges in Np mutant embryos, e.g. Fig. 1B (red-dextran, left part of the tracheal lumen shows bulges) and even the Dpy::YFP matrix tear off at the site of bulges (Fig. 4F’’, above the arrowhead). But we did not know at the time the link with Pio and Dumpy

      However, we agree, it is important to know more about the appearance of the phenotype by means of quantifications. The quantifications of bulges per dorsal trunk (n=16) is shown in Fig. 7B.

      7.) Minor point: I don't understand what the authors are trying to show in supplementary Figure 8. Tracheal cells detach and are found in the lumen?

      We are sorry for the unclear description in the legend. We corrected it as follows in the legend of Fig. S12:

      “This indicates disintegration of apical cell membrane at bulges and subsequent leaking of cellular content into the lumen.”

      8.) Np function conserved matriptase.<br /> In this work, Drees and colleagues showed that Np controls in vivo the cleavage of the Pio ZP domain.<br /> Dumpy and Piopio are not conserved in vertebrates but they both contain a ZP domain which is conserved. The authors tested if other ZP proteins can be cleaved by Np or the human homolog Matriptase. The authors tested in cell culture the ability of the type III Transforming growth factor-β receptor which contains a ZP domain to be cleaved either by Np or Matriptase.<br /> This could be a general mechanism that needs to be extended to other ZP domain proteins and that could be at play to structure the matrix and give it its physical properties.<br /> However, as it is all speculative, I find the discussion section related to these data, for too long and that does not help to understand better the work done in the formation of the tracheal tubes of the drosophila embryo.

      We show that Np mediates cleavage of the Pio ZP domain in vitro and in vivo in Drosophila embryos. We further showed that also the human matriptase was able to cleave the Pio ZP domain. To understand if this is a more general mechanism, we extended our studies with the human TβIII and its ZP domain. These data show that both Drosophila and human matriptases are able to cleave ZP domains of different proteins from different species. These data suggest that Matriptase-mediated ZP domain cleavage is not a Drosophila specific mechanism. We cannot follow the argumentation of the referee to state it all speculative. Nevertheless, we agree that it will need follow up studies to show that the mechanism is more general than two different species and ZP domain proteins. Anyway, as requested by the referee, we deleted the following sentences of the paragraph, since they are speculative in the context of our manuscript and do not directly describe a potential matriptase and ZP domain function:

      “Matriptase degrades receptors and ECM in pulmonary fibrinogenesis in squamous cell carcinoma (Bardou et al., 2016; Martin and List, 2019). TβRIII is a membrane-bound proteoglycan that generates a soluble form upon shedding (López-Casillas et al., 1991), a potent neutralizing agent of TGF-β. Expression of the soluble TβRIII inhibits tumor growth due to the inhibition of angiogenesis (Bandyopadhyay et al., 2002). Idiopathic pulmonary fibrosis (IPF) is associated with a progressive loss of lung function due to fibroblast accumulation and relentless ECM deposition (King et al., 2011; Loomis-King et al., 2013). “

      However, the comparisons of the tubular organ and the phenotypic expressions of the bulging membrane and the aortic aneurysm appear to us as an important element of the article. In both cases, cell membrane loses its integrity and can break in tubular networks. Thus, with our findings on the modification of extracellular ZP proteins, we offer a potential new molecular approach even for clinical investigation.

      9.) Minor points: Pio and cytoskeleton organisation.<br /> Line 78-79, the authors wrongly quoted a work from Brodu et al (2010). Pio does not anchor the microtubule severing enzyme Spastin. Instead, Spastin releases the microtubule-organising centre from its centrosomal location, then Pio contributes to its apical membrane anchoring. It can therefore be assumed that the organisation of the microtubule network is affected in a pio null mutant. In addition, ZP proteins have been shown to link the aECM to the actin cytoskeleton. Therefore, it would be interesting to look at the organisation of the actin and microtubule cytoskeletons in a pio mutant context in which enlarged apical cell surface area are observed.

      We are very thankful for finding this mistake in the introduction. We corrected it as follows:

      “Further, Pio is involved in relocating microtubule organizing center components γ-TuRC (γ-tubulin and Grips; gamma-tubulin ring proteins). This requires Spastin-mediated release from the centrosome and Pio-mediated γ-TuRC anchoring in the apical membrane.”

      Studying cytoskeleton in pio mutant embryos is a helpful idea. Therefore, we analyzed F-actin with Phalloidin and beta tubulin (E7 antibody, DSHB) in the dorsal trunk cells of stage 16 control and pio mutant embryos. However, tracheal cells are tiny and only gross changes can be realized. The confocal Z-stack analysis of the stainings did not show gross differences between control and pio mutant embryos. We observe the expected apical subcortical accumulation for the actin and tubulin cytoskeleton in dorsal trunk cells of pio stage 16 mutant embryos which also has been shown for wt embryos elsewhere. These new data are presented in the supplement Fig. S7.

      _Referees cross-commenting

      I have just read the comments of the other two reviewers, who like me are specialists in the formation of the tracheal system in the drosophila embryo.<br /> I find the comments very fair and balanced. They are in the same spirit as my comments and are very complementary. I hope that all our comments will be constructive for the authors and will improve the quality of their work._

      Reviewer #3 (Significance):

      _Overall, the methodology is sound, the quality of the data is good and the paper is very well written. Authors combine in vivo, in vitro studies as well a cell culture approach. Using CRISPR/Cas9, they generated a large number of new tools allowing in vivo studies.<br /> Drees and colleagues generated new alleles of pio which are lack of function alleles. They described a new phenotype for pio mutant embryos, namely over-elongated tubes. But they authors do not comment on why these new alleles reveal a new phenotype. Furthermore, using their piomCherry::pio line, the authors state that Pio is localised to the plasma membrane. This location is very difficult to assess. Both new results require clarification.<br /> The authors had already demonstrated that Np cleaves the ZP domain of Pio in vitro. Here they demonstrate this in vivo. It appears important to evaluate the formation of the dorsal tube when an NP non-cleavable form of Pio is expressed.<br /> Finally, the model proposing a coupling between the extracellular matrix and the membrane of tracheal cells is very interesting. The demonstration that cleavage of Pio by Np could participate in this coupling is very interesting for those interested in the integration of mechanical stress and cellular deformation. However, such a model has already been discussed in Dong et al (2014). In this article, Dong et al. proposed that a "coupling of the apical membrane and Dpy matrix core is essential for tube length regulation".

      The audience for this article should be specialised and oriented towards basic research. It may be of interest to people working on tubular systems or working on ZP proteins.

      My field of expertise is cell biology and developmental biology in drosophila and formation of tubular networks._

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their constructive criticism that helped us to improve the paper. We modified Fig.6I and Fig.7, replaced Fig.8, and added supplementary Figs. 3-5 and supplementary Tables S1-2. The manuscript was extensively re-written. A new paragraph was added in the Discussion section where relative adhesiveness was related to absolute adhesion strength and the cadherin knockdown result to earlier findings.

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary: This work examines the relationship between cell-cell contacts and pericellular matrix in Xenopus chordamesoderm, which is a tissue actively involved in convergent extension during gastrulation. By lanthanum staining of pericellular materials, the authors found that different types of pericellular matrix are present in cell-cell contacts in the chordamesoderm, which may mediate cell-cell adhesion. Knockdown of C-cadherin, Syndecan-4, fibronectin, and hyaluronic acid leads to the reduced abundance of cell contacts and cell packing density, but this does not seem to affect convergent extension. Based on these observations, the authors propose a model in which cell-cell contacts involve the interdigitation of distinct pericellular matrix units.<br /> Major points:

      1. Knockdown of adhesion molecules separates cells and leads to wide contacts with large interstitial spaces. Data in figure 1 show loosely packed morphant chordamesoderm cells. Intuitively, these should reduce cell-cell adhesion. However, a main conclusion from this manuscript is that reduced abundance of narrower contacts does not decrease adhesiveness. Although depletion of adhesion molecules modifies but not abolishes a contact, non-attached free surfaces increase significantly in morphant cells. It is therefore not easy to understand that how reduced cell contacts have no effect on cell adhesion.

      We added a section to the Discussion to address this issue (p.11ff). We show in the Results section (modified Fig.7) that relative adhesiveness is indeed significantly reduced in the morphants (Syn-4 always being the exception) when compared in the contact width range of normal chordamesoderm. However, contact width is strongly increased in the morphants, and adhesiveness increases linearly with width. We argue that these effects compensate for the initial lowering of adhesiveness. In other words, adhesive contacts become shorter (more gap surface) but wider (see Fig.6I), and become the more adhesive the wider they become. As in the original version of this paper, we then propose a model that explains the empirically observed increase of adhesiveness with width. How the abundance of cell-cell contact is reduced is less clear yet. Pericellular matrix deployment and structure is strongly affected by adhesion factor knockdown, and contact types are altered. Some contact types seem to widen but remain adhesive, others become non-adhesive, and still others may disappear without being replaced (see last paragraph of Discussion). To add detail to these notions and clarify this important issue to satisfaction will require future research.

      Importantly, the adhesiveness was not experimentally tested.

      Due to external circumstances, we were unable to perform additional experiments. However, we used our previously published quantitative data on adhesion in gastrula tissues including the chordamesoderm to interpret our present results for normal and C-cad-depleted chordamesoderm, and to relate relative adhesiveness to absolute adhesion strength, in a new section of the Discussion (p.11ff).

      1. It is surprising that reduced cell contacts, at least narrower cell contacts, do not affect convergent extension. Does this mean that active cell behavior changes in the chordamesoderm, which are required for convergent extension, are independent of cell contact types?

      We actually claimed that all treatments inhibited convergent extension, except for Syn-4 (Barua et al. 2021, and this manuscript, p.3, Fig.1B,C). Syn-4 knockdown had a dramatic effect on cell contacts, cell density and cell shape but none on convergent extension, at least up to the middle gastrula stage. This is surprising and does not fit easily to current views of cell intercalation during convergent extension, but analysing the underlying cell behaviors is beyond the scope of this article.

      1. Although the formation and localization of pericellular materials are differentially affected after knockdown of adhesion molecules, there is no clear evidence showing that different types of pericellular matrix mediate cell-cell adhesion in the chordamesoderm. It is possible that the disrupted distribution of pericellular materials in morphants only represents a secondary consequence of changed cell contacts. This may be supported by the fact that knockdown of adhesion molecules reduces narrow contacts and increases LSM-free gaps.
      2. The relationship between contact width spectra and LSM is also very elusive. Again, changes in contact width or abundance and distribution of LSM may be indirectly caused by loss of adhesion molecules. Therefore, although knockdown of adhesion molecules leads to changes of LSM localization, it cannot be concluded that cell-cell contacts in chordamesoderm are mediated different types of pericellular matrix.

      We find it difficult to interpret for example Fig.5A-F other than assuming an adhesive role for the pericellular matrix, in this case LSM, in normal and morphant tissue. What else would here hold two cells between two gaps together? The contacts are often much too wide for cadherin-cadherin binding. We indeed believe that changes in contact width or abundance are caused by the loss of adhesion molecules, directly or indirectly. Our LSM images show that remarkably, modified contacts (e.g. Fig.3D,F; Fig.5B,C) are still able to keep cells together over some distance, between interstitial gaps, and our quantitative data indicate similarly that e.g. contact widening is consistent with continued adhesion. However, some of the contacts may become non-adhesive, or be lost without being replaced, increasing non-adhesive gap surface. This is discussed now on p.11, middle paragraph.

      1. In contrast to the present observations, works by others using the same morpholinos have shown that Cadherin-dependent cell adhesion, fibronectin-rich extracellular matrix, and Syndecan-4-regulated non-canonical Wnt signaling are required for convergent extension. These discrepancies need to be appropriately addressed.

      As mentioned above, we found that all treatments affected convergent extension, as expected from the work of others and our own, except for Syn-4 depletion. We noticed that in the paper by Munoz et al. on Syn-4 overexpression and knockdown, only late gastrula/early neurula stages were evaluated. Syn-4 knockdown produced moderately strong axis defects, perhaps in part related to impaired neural plate closure. Unfortunately, we did not follow our morphants to these later stages to see whether defects developed then. But our main interest here is cell-cell contacts.

      1. If LSM and LSM-free contacts are similarly adhesive, what will be role of LSM in cell adhesion and how cell adhesion is established in these LSM-free contacts?

      We discuss now more explicitly the notion that gastrula non-epithelial cell adhesion is mediated by a mosaic of pericellular matrix patches of different composition, some containing LSM in different configurations, others not, but each similarly adhesive.

      Minor points:<br /> 1. It may be helpful to clearly define the pericellular matrix in this particular context and its relationship with LSM. It is also necessary to clarify whether the adhesion molecules examined in this work are considered as components of the pericellular matrix.

      We explain the use of these terms at the end of the first paragraph of the Introduction. The most general term is pericellular matrix; part of it is La3+ labeled – LSM; and some of the LSM can be compared to structures which in other systems are termed glycocalyx. We consider the adhesion molecules examined to be part of the pericellular matrix but are aware of other putative functions, like in cell signaling, which may indirectly affect contacts and thus contribute nevertheless to the phenomena studied here.

      1. In figure 1B, it appears that the Cadherin morphant has defects in chordamesoderm elongation and archenteron formation, suggesting impaired convergent extension.

      We find, in agreement with the work of others, that C-cad knockdown impairs convergent extension, and mention this when we describe Fig.1B.

      1. In figure 1C, the Syndecan-4 morphant gastrula clearly shows enhanced anteroposterior elongation of chordamesoderm and archenteron in comparison with the wild-type embryo. This seems to suggest that loss of Syndecan-4 promotes the movements of convergent extension. However, previous studies indicate that both gain and loss of Syndecan-4 impairs convergent extension.

      As mentioned above, late gastrula/early neurula stages were evaluated in the Munoz et al. paper, mid-gastrula stages in our work. One possible explanation would be that mild axis defects develop later, partly in connection with neural tube elongation and closure.

      1. Ideally, in knockdown experiments, control embryos should be injected with corresponding mismatch morpholinos.

      We explain in the Methods section that we only used morpholinos that were extensively characterized in previous publications.

      1. In figure 1E, it is unclear what type of cell contacts the light green arrowheads indicate.

      This is explained now in the figure legend.

      1. Figure 1 legend, "(wt) is from Barua et al. 2021". I am not sure it is appropriate to use previously published data.

      The present data were derived by further evaluations of the same samples and TEM sections as used in Barua et al. 2021. We show the previously published data (acknowledged in the legends) here for easy comparison (instead of citing the previous paper).

      1. There is no light blue arrowhead in figure 2, and in figure 3B and 3I, it seems that the same colored arrows are used to indicate different structures.

      This has been corrected.

      1. Triple-layered contacts are not clearly defined.

      We define this term now repeatedly, as consisting of two LSM layers enclosing a non-labeled layer between them.

      1. Page 2, "based on driven by" should be either "based on" or "driven by".

      Has been corrected.

      1. Page 8, "selectin" should be "selecting".

      Has been corrected.

      Reviewer #1 (Significance):

      Strengths:<br /> Demonstrated the effects of several adhesion molecules on the formation of cell contacts and pericellular matrix in Xenopus chordamesoderm.<br /> Limitations:<br /> The significance of chordamesoderm cell contact changes in convergent extension or gastrulation is not clear;

      Effects on gastrulation of PCM or membrane adhesion molecule depletion have very often been described as mediated by effects on cell signaling. Without excluding such possibilities, we liked to redirect attention here to other putative mechanisms by describing basic effects of treatments on cell-cell contacts including PCM deployment and structure. Future work must relate the specific, often dramatic, contact changes upon depletion of a specific factor to cell behavior during convergent extension and other tissue movements.

      there is no direct evidence showing the functional link between pericellular matrix, cell contacts and cell adhesion;

      Please see our response to main points 3 and 4 above.

      the absence of effects on convergent extension after depletion of several adhesion molecules is not fully consistent with previous reports.

      Please see our response to main points 2 and 5 and minor point 3 above.

      Advance: This work likely provides some fundamental and methodological advances for studying cell-cell adhesion. It shows promise for elucidating mechanisms underlying the regulation of cell contact changes in tissues involved in morphogenetic movements.<br /> Audience:<br /> This work likely interests readership studying embryonic cell adhesion in the field of developmental biology and cell biology. It may be also potentially interesting for people working on glycocalyx pericellular matrix in adult tissues.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary: During gastrulation, cells within vertebrate embryos require the ability to both adhere to one another and rearrange with their neighbors to shape the emerging body plan. These authors posit that such flexible adhesive contacts are mediated in part by the pericellular matrix (PCM), including multiple types of glycocalyces containing molecules such as fibronectin, hyaluronic acid, and syndecans, which they previously characterized in multiple embryonic tissues (Barua et al, PNAS, 2021). Here, in a follow-up to their 2021 study, the authors use electron microscopy to characterize the pericellular matrix within the chordamesoderm of Xenopus gastrulae. They identify several types of adhesive contacts within the chordamesoderm and assess how they are altered in the absence of key PCM molecules via morpholino knock-down. They conclude that syndecan-4 and hyaluronic acid comprise and promote assembly of PCM plaques whereas fibronectin and C-cadherin anchor them to cell surfaces. Cell packing density is decreased upon loss of all 4 of these molecules, which the authors attribute to a decrease in the number of cell contacts without affecting the strength of the remaining contacts. They further conclude that adhesiveness increases linearly with contact width, and that this relationship is unaffected by loss of any aforementioned adhesive/ PCM molecules.

      Major comments:<br /> Many conclusions in this manuscript are based on measurements of cell contact angles, which indicate the reduction of tension at cell contacts vs. free cell surfaces and thus relative adhesive strength. While this lab previously applied the same approach to live tissues (David et al, 2014), it is not clear to what extent such measurements accurately reflect adhesive strength in fixed tissues and/or electron micrographs. Especially given the issue of random sectioning planes, which cause distortion of contact angles. Although a correction was applied, the authors note this is not theoretically derived because the heterogeneity of gap sizes made such calculations too difficult. Indeed, it appears that the large gaps between cells within morphant embryos affect contact angle measurements, but if this is corrected for in any way, it is not mentioned.

      Geometrically determined contact angle distortion should affect angle or relative adhesiveness distributions in all conditions or treatments similarly and thus should not or only little affect comparisons of distribution peaks, averages, etc. Beyond this effect of random sectioning planes, we don’t see how large contact width should by itself affect measurements of angles.

      Because this is the sole measure of cell adhesion provided in the study, this reviewer is not convinced of the conclusion that loss of PCM components does not affect adhesive strength.

      In response to this criticism, we re-evaluated our adhesiveness-width data (Fig.7A-E). We noticed that there is indeed a reduction of relative adhesiveness when morphants are compared to normal chordamesoderm within the width range of the latter. But the addition of increased widths in the morphants and the linear increase of adhesiveness with width compensated or overcompensated the initial reduction of adhesiveness.

      Could such measurements not be made from live cells/tissues after manipulating PCM components, as the lab has done previously? Because the lab already has the necessary reagents and expertise for such experiments, the time and resources needed for such measurements shouldn't be prohibitive.

      Due to circumstances, we were unable to perform additional experiments. However, we used our previously published quantitative data on adhesion in gastrula tissues including the chordamesoderm to analyze our present results for normal and C-cad-depleted chordamesoderm, and to relate relative adhesiveness to absolute adhesion strength, in a section added to the Discussion (p.11ff).

      • As mentioned above, these authors previously measured adhesive strength in live Xenopus cells and tissues (David et al, 2014). In that study, they found that C-cadherin MO reduced relative adhesiveness whereas the current study found that relative adhesiveness actually increases in this condition. What explains this discrepancy?

      We explain now in the new Discussion section (p.11ff) and with the help of supplementary Figure S5 how adhesion strength and relative adhesiveness are related overall (tissue surface vs. cell contacts) and at gaps within a tissue (gap free cell surface vs. cell contacts). In the previous study (David et al, 2014), we discussed relative adhesiveness in relation to overall adhesion strength, and both are decreased upon C-cad knockdown. Here we examined these parameters at interstitial gaps, where we find a small increase of relative adhesiveness, due to overcompensation caused by a strong increase of adhesiveness with contact width. Using our David et al, 2014 data we quantitated the effects. We previously found a similar increase of relative adhesiveness at gaps in C-cad morphant ectoderm (Barua et al. 2017) which we could not explain at the time, but explain now by analogy to our chordamesoderm results.

      • No control morpholinos are used, and for the morpholinos that are used, the doses are very large. An equally high dose of control MO should be used to ensure that all observed phenotypes are specific.

      We detail in the Methods section that we used here and in previous publications only previously characterized morpholinos.

      • It appears that all the images analyzed were collected in the sagittal plane, and the analyses don't seem to consider the intrinsic polarity of the chordamesoderm. For example: cells in different positions within the tissue (basal vs. apical), or that WT chordamesoderm cells are mediolaterally polarized and actively intercalating whereas disruption of PCM components like fibronectin disrupts cell intercalation and randomizes cell polarity. It is possible that 1) cell-matrix (in basal cells) and 2) cell-cell (during intercalation) interactions may affect the measurements made in this study. In other words, that cell contacts could differ by position within the embryo and intercalation/polarity status... have such effects been accounted for in the current analysis?

      Here we only analyzed cell contacts deep in the chordamesoderm. Basal contacts were examined to some extent in Barua and Winklbauer, 2022, apical contacts not yet. Our present analysis is based on sagittal sections. The cells in the chordamesoderm are elongated and aligned mediolaterally but not in register, i.e. they are randomly wedged between each other. Thus, all mediolateral positions in cells should be present in our samples. Nevertheless, trends in the occurrence of contacts related to medial-to-lateral positions on cells (e.g. recognizable in spindle-shaped cells as wide vs narrow cell cross-sections) may have escaped our attention, and in particular, the protrusion-bearing medial and lateral ends of cells may develop special contacts. However, our goal in this study was to analyse basic properties of cell-cell contacts in this tissue, as a foundation for further detailed studies.

      • In this study, the authors state that chordamesoderm movements are preserved in syndecan-4 morphants, and in their 2021 article (Barua et al) they state that convergent extension movements are accelerated. But another study describing this MO found that it causes severe convergent extension defects (Munoz et al, NCB, 2006). What explains this discrepancy?

      In their knockdown experiments, Munoz et al. find relatively mild axis defects in late gastrula/early neurula stage embryos while we studied the mid-gastrula. Perhaps defects develop during later stages in Syn-4 morphant embryos.

      Also, the syn-4 morphant showed in Fig. 1 appears more developmentally advanced than the other embryo... if the embryos are not stage matched it could affect the measurements and conclusions drawn from them.

      Stage matching was not possible since C-cad and FN morphants did not involute or engage in convergent extension (i.e. were arrested at the initial gastrula stage), Syn-4 morphants appeared to gastrulate faster than normally. Therefore, embryos were strictly time matched. A limitation remains, that the time course of cell contact development over gastrulation was considered low priority in this initial study and was thus not determined.

      • In figure 7, the authors plot relative adhesion (measured from contact angles) vs. contact width, then fit regression lines to the lower boundaries of these scatter plots. It is not clear why this analysis is focused only on the lower boundaries rather than considering the full spread of the data. Particularly for syn-4 morphants, whose values do not appear to be concentrated along the lower boundary. This analysis is further confused by the introduction of alpha*, which represents relative adhesiveness relative to the regression.

      The lower boundary line is most convenient to extract (Fig.7A’-E’). But we agree that the “interior” of the scatter plot distribution should also be analyzed. Using average adhesiveness gives rise to artifacts since the density of data points decreases strongly with contact width but also with distance from the lower boundary, leading to the preferential disappearance of large adhesiveness values for higher widths. Instead, we constructed a line tracing the highest density in the scatter plot near the lower boundary (Fig.7B’’-E’’), by determining the positions of adhesiveness distribution peaks in consecutive width brackets (new Fig.8, Fig.S3). We abstained from introducing alpha*.

      • Based on these regression lines alone, the authors conclude that all 4 conditions are similar enough to pool the data for further analysis. If these contacts have different properties, which the data in Figures 1-6 suggest they do, it seems inappropriate to pool them together.

      We no longer pooled the data, except in supplementary Fig.S4 where we consider angle distortion. Instead, we show in Fig.8 relative-adhesiveness frequency distributions for different treatments and width brackets. This emphasizes differences between the different adhesion factor depletions and shows that adhesiveness is not simply normal or log-normal distributed, in agreement with different contact types contributing differently though similarly to overall adhesion. It also allows to follow main peaks as they shift position with width, roughly in proportion to the lower surface boundary.

      Based on this pooling, the authors then conclude that relative adhesiveness increases linearly with contact width over the entire width range, regardless of adhesion factor depletion. This again assumes that all contacts (morphant and WT) are functionally equivalent, and that what is observed in morphant embryos in very wide contacts would also hold true in WT contacts. But because WT contacts occupy only a small portion of the width range, we cannot know how they would behave if scaled to be wider, and I am not convinced that very wide morphant contacts are representative of or functionally equivalent to WT. In other words, we cannot know that contact width is the only factor increasing their relative adhesion, given the experimental manipulations that structurally alter these contacts.

      Although differences between contact types are apparent, we think that the contacts function very similarly. We still hold that relative adhesiveness increases with contact width, as seen in each of the separate plots for wt and adhesion factor depletions. But re-evaluating the alpha-width scatter plots now we show that in the narrow width range of normal chordamesoderm, C-cad, FN and Has depletions show similar, significantly decreased relative adhesiveness (Fig.7A-E). With alpha proportional to width, and width strongly increased in morphants, this initial decrease is compensated in total adhesiveness averages. The relative independence of adhesiveness from contact type could hint at non-specific PCM-PCM adhesion (Winklbauer, 2019). We think that although adhesion factor depletion leads to the loss of some contact types or renders others non-adhesive (thus lowering contact abundances), it modifies some contact types (e.g. by widening them) while only moderately lowering their adhesiveness per unit interaction surface.

      Minor comments<br /> - In their descriptions of PCM in different experimental conditions, the authors overstate some conclusions drawn from EM data. For example, that type I glycocalyces are absent in chordamesoderm (although this signal is only reduced),

      We qualified the statement.

      or that because the Has2 morphant phenotype is intermediate between C-cad and fibronectin morphants this indicates an adhesive role for hyaluronic acid.

      Overall, Has2MO increases the abundance of gaps, i.e. HA normally reduces gaps between cells, strongly suggesting an adhesive role of HA. HA is also required for the formation of 10-20 nm gaps, again proposing a direct or at least indirect adhesion-promoting role.

      • The authors state of the data in figure 1 that "All treatments significantly increase the size of non-adhesive gaps", but they don't show a quantification of the gaps size (they show the abundance).

      Has been corrected.

      • The authors state that LSM contacts exist as 10-20 and 20-50 nm subtypes. It is not clear what about the data suggest this division.

      In the LSM width difference spectra, CadMO and SynMO both increase the abundances of ≤ 20 nm contacts and decrease those of 20-50 nm contacts (Fig.4). The different response suggests at least two differently reacting subtypes.

      • In the same paragraph, the authors state that "C-cad and Syn-4... favor LSM width between 20-50 nm." What is meant by "favor"? Given that the number of 20 nm contacts is increased and 50 nm contacts is decreased in both conditions, this statement is unclear.

      The whole paragraph has been reworded.

      • On page 7, the authors say that the size of LSM structures is "consistent with larger plaques being assembled from small units", but if that were the case, wouldn't the plaque sizes be multiples of the size of a single unit? I.e. 100, 200, and 300 nm peaks? Because this is not the case, the data seem more consistent with a continuous range of LSM plaque sizes than with discrete units.

      The size of the units has a peak at 100 nm but a long tail (Fig.6F-H). Moreover, we discuss lateral compression (piling up of PCM material) or active stretching of plaques (to separate units for interdigitation), all factors that would blur plaque length patterns, i.e. we did not expect plaque sizes to be multiples of 100 nm.

      • On page 8, the authors refer repeatedly to LSM volume. Given that these measurements are made from TEM sections, how is volume being measured?

      This is explained now (p.7).

      • The authors present a model in which PCM interdigitates within cell contacts, but this is based on measurements from static tissues alone. Could the measurements of contact width instead be explained by compression of the PCM or some other mechanism? The data as presented don't rule out such possibilities.

      The model is in agreement with the linear increase of relative adhesiveness with contact width, with LSM height at gap surfaces not adding up to adjacent contact width, with visible interdigitation of glycocalyx units (“bushes”) described previously for prechordal mesoderm (Barua et al. 2021), and with the good agreement of calculated unit size with the size of measured LSM units. In addition, it agrees with literature data on endothelial glycocalyx plaques being composed of 100 nm units and of complete interpenetration of glycocalyces during blood cell adhesion.

      Some terms used are not clear, for example: "partial LSM", "triple layer contact", "random removal [of LSM plaques]".

      We point out the meaning of the terms now more clearly. That “partial LSM” is identical with “triple layer contact” (but shorter, for use in figure) is explained in the legend to fig.6.

      • In figure 5, the graphs depict negative "abundance". Recommend "difference in abundance" instead.

      Done. For shortness, Δ Abundance.

      • Statistics: In figure 1I, it is not clear what the asterisk in this graph means or if statistical differences between these groups was determined. And in figure 6, some groups are marked as n.s., but P values for groups that are statistically different are not presented.

      The asterisk in fig.1I was meant to indicate that this column is from Debanjan et al. 2021, but this is indicated by different shading and mentioned in the legend. The non-used n.s. marks were removed.

      Reviewer #2 (Significance):

      This detailed electron microscopy study advances our understanding of pericellular matrix within vertebrate embryos and how loss of its constituent molecules affects cell interactions. It further addresses the relationship between structurally distinct pericellular matrices and their adhesive properties, although this analysis is less convincing. This study adds to a body of literature in which cell-cell and cell-matrix adhesion are known to regulate morphogenetic cell movements, but how such contacts are remodeled as cells rearrange is poorly understood. Previous work has also used measurements from live cells, embryos, and tissues to infer physical forces within embryos such as adhesive strength, cortical tension, and viscosity. This work follows up directly on a previous study from this group that characterized glycocalyces within various tissues within Xenopus gastrulae by electron microscopy. The hypothesis that pericellular matrix enables flexible/fluid adhesion within highly dynamic embryonic tissues is exciting, and is likely to be of interest to developmental biologists - particularly those who apply mechanical concepts to embryos. However, additional evidence, preferably from live tissues and embryos, is needed to support this hypothesis. This assessment is based on over 15 years' experience studying gastrulation morphogenesis in multiple vertebrate species.

    1. Reviewer #3 (Public Review):

      The authors previously showed that expressing formate dehydrogenase, rubisco, carbonic anhydrase, and phosphoribulokinase in Escherichia coli, followed by experimental evolution, led to the generation of strains that can metabolise CO2. Using two rounds of experimental evolution, the authors identify mutations in three genes - pgi, rpoB, and crp - that allow cells to metabolise CO2 in their engineered strain background. The authors make a strong case that mutations in pgi are loss-of-function mutations that prevent metabolic efflux from the reductive pentose phosphate autocatalytic cycle. The authors also argue that mutations in crp and rpoB lead to an increase in the NADH/NAD+ ratio, which would increase the concentration of the electron donor for carbon fixation. While this may explain the role of the crp and rpoB mutations, there is good reason to think that the two mutations have independent effects, and that the change in NADH/NAD+ ratio may not be the major reason for their importance in the CO2-metabolising strain.

      Specific comments:

      1. Deleting pgi rather than using a point mutation would allow the authors to more rigorously test whether loss-off-function mutants are being selected for in their experimental evolution pipeline. The same argument applies to crp.

      2. Page 10, lines 10-11, the authors state "Since Crp and RpoB are known to physically interact in the cell (26-28), we address them as one unit, as it is hard to decouple the effect of one from the other". CRP and RpoB are connected, but the authors' description of them is misleading. CRP activates transcription by interacting with RNA polymerase holoenzyme, of which the Beta subunit (encoded by rpoB) is a part. The specific interaction of CRP is with a different RNA polymerase subunit. The functions of CRP and RpoB, while both related to transcription, are otherwise very different. The mutations in crp and rpoB are unlikely to be directly functionally connected. Hence, they should be considered separately.

      3. A Beta-galactosidase assay would provide a very simple test of CRP H22N activity. There are also simple in vivo and in vitro assays for transcription activation (two different modes of activation) and DNA-binding. H22 is not near the DNA-binding domain, but may impact overall protein structure.

      4. There are many high-resolution structures of both CRP and RpoB (in the context of RNA polymerase). The authors should compare the position of the sites of mutation of these proteins to known functional regions, assuming H22N is not a loss-of-function mutation in crp.

      5. RNA-seq would provide a simple assay for the effects of the crp and rpoB mutations. While the precise effect of the rpoB mutation on RNA polymerase function may be hard to discern, the overall impact on gene expression would likely be informative.

  5. Jul 2023
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Please find our point-to-point response to the reviewer’s comments below, where we marked all changes implemented in the manuscript in italics.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      With the emergence and spread of resistance to Artemisinin (ART), a key component of current frontline malaria combination therapies, there is a growing effort to understand the mechanisms that lead to ART resistance. Previous work has shown that ART resistant parasites harbour mutations in the Kelch13 protein, which in turn leads to reduced endocytosis of host haemoglobin. The digestion of haemoglobin is thought to be critical for the activation of the artemisinin endoperoxide bridge, leading to the production of free radicals and parasite death. However, the mechanisms by which the parasites endocytose host cell haemoglobin remain poorly understood.

      Previous work by the authors identified several proteins in the proximity of K13 using proximity-based labelling (BioID) (Birnbaum et al. 2020). The authors then went on to characterise several of these proteins, showing that when proteins including EPS15, AP2mu, UBP1 and KIC7 are disrupted, this leads to ART resistance and defects in endocytosis leading to the hypothesis that these two processes are inextricably linked.

      In this manuscript, Schmidt et al. set themselves the task of characterising more K13 component candidates identified in their previous work (Birnbaum et al. 2020) that were not previously validated or characterised. They chose 10 candidates and investigated their localisations, and colocalisation with K13, and their involvement in endocytosis and in vitro ART resistance, 2 processes mediated by K13 and some members of the K13 compartments

      The authors show that of their 10 candidates, only 4 can be co-localised with K13. Then, using a combination of targeted gene disruption (TGD) as well as knock sideways (KS), they characterised these 4 proteins found in the K13 compartment. They show that MyoF and KIC12 are involved in endocytosis and are important for parasite growth, however their disruption does not lead to a change in ART sensitivity. The authors also confirm the findings of their previous publication (Birnbaum et al. 2020), using a slightly different TGD

      (note from the authors: we apologise if this has not properly transpired from the manuscript but the difference between the TGDs is substantial and relevant: one has less than 3% of the protein left and hence can be considered to fully inactivate MCA2 and has a growth defect whereas the other contains about two thirds of the protein (1344 amino acids/~66% are left), has no growth defect, although it lacks the MCA2 domain (hence that domain can not be critical for the growth defect)),

      that MCA2 is involved in ART resistance, however they did not check whether its disruption impacts haemoglobin uptake. They also show that KIC11 is not involved in mediating haemoglobin uptake or ART resistance. To finish, the authors used AlphaFold to identify new domains in the proteins of the K13 compartment. This led them to the conclusion that vesicle trafficking domains are enriched in proteins of the K13 compartment involved in endocytosis and in vitro ART resistance.

      The majority of the experiments conducted by the authors are performed to a good standard in biological and technical replicates, with the correct controls. Their findings provide confirmation that their 4 candidate genes seem to be important for parasite growth, and show that some of their candidates are involved in endocytosis. While the KD and KS approaches employed by the authors to study their candidate genes each have their own advantages and can be excellent tools for studying a large sets or genes, this manuscript highlights the many limitations of these approaches. For example, the large tag used for the KS approach can mislocalise proteins or disrupt their function (as is the case for MyoF), resulting in spurious results, or indeed the inability to generate the tagged line (as is the case for MCA2). The KS approach also makes the results of a protein with a dual localisation, like KIC12, extremely difficult to interpret.

      We thank the reviewer for this thorough and insightful review.

      The limitations mentioned above were addressed in the response to the main points and a general detailed response in regards to the systems used for this research are added at the end of this rebuttal. Briefly summarised here: while we agree that there are limitations of the system used, we are convinced that

      • the advantages of using a large tag in most cases outweighs the drawbacks as it permits to track the inactivation of the target, if need be on the individual cell level

      • while not optimal for MyoF, the partial inactivation actually helps in its functional study as detailed in major point 23&28 or reviewer#3 major point 11: it shows a consistent correlation of the phenotype with different causes and degrees of inactivation (this is now better illustrated in Figure 1L1M). Further, regarding the concern of the large tag: the effect of the tag based on localisation was overestimated in the review by what seems to have been a mix up comparing numbers from MyoF with a number from MCA2 (there is a difference, but it is only small) (see reviewer#1 major point #23).

      • KS is the optimal method for most of the assays in this work (e.g. bloated food vacuole assays and RSAs); these assays would be impossible or difficult to use with other inactivation systems currently used in P. falciparum research (see details in the response to the specific points and after the rebuttal)

      In regards to the difficulty to interpret KIC12 data: this is only true for measuring absolute essentiality, everything else we believe we actually have the optimal method. If not KS, which method targets a specific pool of a protein with a dual localisastion? Again, our assays targeting the K13 pool and revealing the specific function would have been difficult or impossible with any other system.

      Ultimately the question is whether any other system would have resulted in a different conclusion on the function of the proteins studied. At present we are confident this would not be the case and other systems probably would not have delivered the specific functional data shown in this work. Clearly, more in depth work will provide more nuanced and detailed insights into the proteins analysed in this work and this likely will also include the use of other systems for specific aspects they are most suitable for. However, this (e.g. different complementations in a diCre cKO) is complex and therefore beyond what fits into this work which had the goal to assess which proteins are true positives for the K13 compartment and to place them into functional groups in regards to endocytosis.

      Moreover, the manuscript is disjointed at times, with the authors choosing to conduct certain experiments for only a subset of genes, but not for others. For example, considering that the aim of this paper was to identify more proteins involved in ART resistance and endocytosis, it is confusing why the authors do not perform the endocytosis assays for all their selected proteins, and why they do not do this for the proteins they identify in their domain search. There is significant room for improvement for this manuscript, and a generally interesting question.

      The reviewer remarks that not every experiment was done for every target. Based on the rebuttal we tried to amend this but also note that there was some sentiment by the reviewers to better stick to the point and not make the manuscript more disjointed. We attempted to balance that as much as possible and hope we were able to honour both aspects (amendments were done as detailed in the point by point response below).

      In regards to endocytosis and choice of targets: We did do endocytosis assays for all proteins that showed a growth phenotype upon inactivation in this work. We therefore assume the reviewer here refers to major point #40 asking for endocytosis assays with KIC4 and KIC5 (which were not studied in this manuscript) as well as MCA2 (point 17). We fully agree with the reviewer that this would fill a gap in the work on K13 compartment proteins but such assays are difficult with TGDs (there are issues with non-comparable samples and compensatory effects) and proteins that are not essential (and hence likely have a smaller impact on endocytosis when truncated). We nevertheless now carried them out, but due to the limitations to do this with these lines would be hesitant to draw definite conclusions (see major point 17 and 40 for details and outcomes).

      But in it's current format, other than confirming that MCA2 is involved in ART resistance (which was already known from the Birnbaum paper), the authors do not further expand our understanding of the link between ART resistance and endocytosis in this manuscript.

      We would like to point out that the importance of the K13 compartment and endocytosis goes beyond ART resistance (see e.g. also newly published papers on the K13 compartment in Toxoplasma, (Wan et al., 2023; Koreny et al., 2023)). Endocytosis is an essential and prominent process in blood stages. However, in contrast to processes such as invasion, our understanding about endocytosis is only rudimentary. Hence, this manuscript provides important insights on an emerging topic that in our opinion deserves more attention:

      • it identifies novel proteins at the K13 compartment and provides 2 new proteins in endocytosis (MyoF and KIC12); getting an as complete as possible list of proteins involved in the process will be critical to study and understand it

      • it leads to the realisation that not all growth-relevant proteins detected at the K13 compartment are needed for endocytosis

      • it provides domains and stage specificity of function for several K13 compartment proteins, overall bolstering the model of endocytosis in ART resistance and providing a framework critical to direct future studies on endocytosis and their detailed mechanistic function at the cytostome

      • the identified vesicle trafficking domains (for instance now also found in UBP1) are expected to strengthen the support for the role of endocytosis of the K13 compartment; this and also the above points are important as (based on the current literature) there still seems to be prominent sentiment in the field that (in part due to the involvement of UBP1 and K13) the cause of ART resistance is due to various unclearly defined stress response pathways

      • with MyoF it also shows the first protein in connection with the K13 compartment that acts downstream of the generation of hemoglobin-filled containers in the parasite and provides the first protein that explains the suspected involvement of actin in endocytosis (so far this was only based on CytD studies)

      Overall we therefore believe this manuscript contains critical information and a framework for future studies on endocytosis and the K13 compartment. We hope the relevance of endocytosis as one of the most prominent and essential processes in the parasites and the connection to various aspects linked with many commercial drugs (in addition to the role of endocytosis in ART resistance), is adequately explained in the introduction. We also would like to mention that the main focus of the work is reflected in the title of the manuscript which does not mention ART susceptibility.

      Major Comments

      1) line 31: please change defined to characterised - defined suggests that novel proteins were identified in this study, which is not the case.

      We apologise, but we do not fully understand this comment. We did identify novel proteins not before known to be at the K13 compartment (MCA2 (admittedly this one was likely but had not previously been verified), MyoF, KIC11 and KIC12). In our view "further defining the composition of the K13 compartment" therefore is an accurate statement. Additionally, the identification of previously not-discovered domains, the stage-specificity and function of these proteins helped to further define the K13 compartment.

      If the reviewer is referring to the fact that the proteins analysed in this study were taken from a previously generated list of hits, we would like to stress that the presence in such a list (obtained from a BioID, but also if from an IP etc) can not be equalled for them to be true positives, they are merely candidates that still need to be experimentally validated. This is what we did in this work to find out which further proteins from the list can be classified as K13 compartment proteins (for hits with lower FDRs this is even more relevant as illustrated by the fact that 6 of the here analysed hits were not at the K13 compartment). In an attempt to address this comment in the manuscript, we changed the wording of this sentence to (line 31): "Here we further defined the composition of the K13 compartment by analysing more hits from a previous BioID, showing that MyoF and MCA2 as well as Kelch13 interaction candidate (KIC) 11 and 12 are found at this site."

      2) line 37: please change 'second' to "another". As explained further below, the authors identified 3 classes of proteins (confer ART resistance + involved in HCCU, involved in HCCU only, or involved in neither).

      We realized that the groups description wasn’t clear in the abstract. Please see response to major comment #41 for a detailed answer to this (endocytosis is an overarching criterion, ART resistance is a subgroup and applies only to those proteins with a function in endocytosis in ring stages). To clarify this (see also major point #8) we added an explanation on the influence of stage-specificity of endocytosis on ART susceptibility to the introduction (line 76): In contrast to K13 which is only needed for endocytosis in ring stages (the stage relevant for in vitro ART resistance), some of these proteins (AP2µ and UBP1) are also needed for endocytosis in later stage parasites (Birnbaum et al., 2020). At least in the case of UBP1, this is associated with a higher fitness cost but lower resistance compared to K13 mutations (Behrens et al., 2021; Behrens et al., 2023). Hence, the stage-specificity of endocytosis functions is relevant for in vitro ART resistance: proteins influencing endocytosis in trophozoites are expected to have a high fitness cost whereas proteins not needed for endocytosis in rings would not be expected to influence resistance.” The abstract was changed in response to this and other comments and hope it is now clearer in regards to the groups.

      3) Line 40: You define KIC11 as essential but according to your data some parasites are still alive and replicating 2 cycles after induction of the knock sideways. Please consider changing "essential" to "important for asexual parasite growth".

      We fully agree with the reviewer, we reworded the sentence as suggested.

      4) Line 40: please change 'second group' to 'this group'

      We reworded this part of the abstract and it know reads: (line 38): “While this strengthened the link of the K13 compartment to endocytosis, many proteins of this group showed unusual domain combinations and large parasite-specific regions, indicating a high level of taxon-specific adaptation of this process.”

      5) line 41: state here that despite it being essential, it is unknown what it is involved in.

      With the newly added data we show that this protein either has a function in invasion or very early ring development although we did not see any evidence for the latter. We therefore changed the sentence to (line 43): “We here identified the first protein of this group that is important for asexual blood stage development and showed that it likely is involved in invasion*..” *

      6) Line 50: the authors should state here that there is actually a reversal in this trend over the last few years.

      Done as suggested.

      7) Line 54: please separate out the references for each of the two statements made in this line (a: that ART resistance is widespread in SEA, and b: that ART resistance is now in Africa) Reference 14 also seems to reference ART resistance in Amazonia - which is not covered by the statement made by the authors (in which case the authors should state ART is now present in Africa and South America). The authors should also reference PMID: 34279219 for their statement that ART resistance is now found in Africa (albeit a different mutation to the one found in SEA).

      Done as suggested.

      8) Line 65: it is also worth mentioning here that there are other mutations in proteins other than K13, such as AP2mu and UBP1 (PMID: 24994911;24270944) that can lead to ART resistance.

      As suggested by the reviewer, we included a sentence about non-K13 mutations linked with reduced ART susceptibility in the introduction (line 74): Beside K13 mutations in other genes, such as Coronin (Demas et al., 2018) UBP1 (Borrmann et al., 2013; Henrici et al., 2020b; Birnbaum et al., 2020; Simwela et al., 2020) or AP2µ (Henriques et al., 2014; Henrici et al., 2020b)* have also been linked with reduced ART susceptibility." *

      We here also added data on fitness cost that is related to this and is also relevant for the issue of proteins with a stage-specific function in endocytosis, making a transition for this statement which might help clarifying the grouping of K13 compartment proteins (see also major point #2).

      9) Line 80, 86: ref 43 is misused. Reference 43 refers to Maurer's clefts trafficking which takes place in the erythrocyte cytosol and is not involved in haemoglobin uptake as far as I know. Please replace ref 43 with one showing the role of actin in haemoglobin uptake.

      We thank the reviewer for pointing this out, Ref 43 was removed from the manuscript.

      10) Line 98: the authors state here that they 'identified' further candidates from the K13 proxiome. This suggests that they identified new proteins in this paper, when in fact the list was already generated in ref 26. All they did was characterise proteins from that list that were not previously characterised. The authors should therefore remove identified from this statement.

      We agree with the reviewer that we did not identify further candidates, we identified new K13 compartment proteins from the list of potential K13 compartment proteins. We therefore changed “identified further candidates” into “identified further K13 compartment proteins” (line 116). Please see also response to major comment #1.

      11) Line 107-108: it is not clear from this sentence why these proteins were left out of the initial analysis in Ref 26. A sentence here explaining this would be valuable for the reader.

      This is a good point. One reason why we did not analyse more in our previous publication was that we had to stop somewhere and adding more would have been very difficult to fit into what was already a packed paper. However, as shown in this work, the list does contain further interesting candidates (e.g. K13 compartment proteins that are involved in endocytosis).

      We altered the relevant part of the introduction to highlight that we previously analysed the top hits, clarifying that the 'remaining' hits analysed in this work were further down in the list. This now reads: (line 113)“We reasoned that due to the high number of proteins that turned out to belong to the K13 compartment when validating the top hits of the K13 BioID (Birnbaum et al., 2020), the remaining hits of these experiments might contain further proteins belonging to the K13 compartment.” We hope this clarifies that we simply moved further down in the candidate list.

      12) Line 117-123: The authors say that PF3D7_0204300, PF3D7_1117900 and PF3D7_1016200 were not studied because they were not in the top 10 hits. However, the current organisation of Supplementary Table 1 shows all 3 proteins among the top 10 hits (MyoF, KIC12, UIS14 and 0907200 being after them). I think the authors should reorganise their table. It is also unclear according to what the proteins in the table are ranked. Could the authors indicate the metric used for the ranking?

      We thank the reviewer for alerting us to this. The issue here is that the 3 non-analysed proteins belong to a 'lower stringency' group comprising hits significant with FDRThe information about ranking is now also included as “Table legend” in the revised manuscript and the Table heading has been changed to: List of putative K13 compartment proteins, proteins selected for further characterization in this manuscript are highlighted.”

      13) Line 129-141: Can the authors be clearer with their explanations of the identification of mutation Y1344Stop? One dataset (ref 61) shows that 52% of African parasites have a mutation in MCA2 in position 1344 leading to a STOP codon. But another dataset (ref 62) shows that the next base is also mutated, reverting the stop codon. That should have been seen in the first dataset as well. Could the authors please clarify.

      This mutation was first spotted in the MalariaGEN database (https://www.malariagen.net) (MalariaGEN et al., 2021), which allows online accessing of the data by using the “variant catalogue” tool, which is in a table format of frequency rather than in a sequence context. Hence, only after further research later on it became evident to us, that this mutation does not occur alone when looking at individual MCA2 sequences from patient samples in (Wichers et al., 2021b). We hope this is accurately reflected in our results section.

      14) Line 147: the authors say that MCA2 is expressed throughout the intraerythrocytic cycle as shown by live cell imaging. In Birnbaum et al 2020 fig 4I, the authors show that MCA2 is mainly expressed between 4 and 16hpi. But in Figure 1B of this manuscript there is a clear multiplication of MCA2 signal between trophozoite and schizont. How do the authors explain this discrepancy? Could expression of the truncated MCA2 be different than the full length? This cannot be assessed as expression and localisation of the full-length HA tag MCA2 is not shown in Schizonts.

      The key difference lies in transcription vs protein expression (usually protein levels peak after mRNA levels peak and - depending on turnover - protein levels can stay high even after mRNA levels have declined). Figure 4 of the Birnbaum et al paper presents transcriptomic data, but with a peak in trophozoites (The axis label in Fig. 4l of that publication is a bit confusing, as hour 0 is at the top, 48 h at the bottom; it is clearer in Fig. S13 of that paper) which would fit very well with the multiplication of the signal between trophozoites and schizonts mentioned by the reviewer. So, overall, the temporal peaks of transcripts and protein of that protein fit well.

      For the signal in rings: Likely the protein has a turnover rate that is sufficiently low for some protein to be taken into the new cycle after re-invasion. Also different transcriptomic datasets e.g. (Otto et al., 2010; Wichers et al., 2019; Subudhi et al., 2020) available on plasmoDB show some mRNA present across the complete asexual development cycle, with each dataset showing maximum peak at a slightly different stage.

      Even when located in foci and hence aiding detection of small amounts of protein (as is the case for MCA2-Y1344-GFP), the MCA2 signal in rings is not strong. For MCA2-TGD, the GFP signal is dispersed and therefore likely below our detection limit, while the same amount of protein concentrated at the K13 compartment is visible as foci in the MCA2-Y1344 cell line. Please note that MCA2-TGD has only 2.8% of the protein left whereas MCA2-Y1344 has 66.5% left and based on our manuscript is almost fully functional, hence fitting the different locations between the two versions.

      Overall we believe this shows that there are actually no significant discrepancies of the expression of the different MCA2 versions.

      15) Line 158: would it not have been more useful for the authors to have episomally expressed MCA2-3xHA in their MCA2Y1344STOP-GFPENDO line to make sure that the truncated protein is indeed going to the correct compartment? The experiments done by the authors suggests that the MCA2Y1344STOP goes to the right location but does not really confirm it.

      We appreciate the reviewers caution here. However, considering that MCA2Y1344STOP-GFPendo co-locates with mCherryK13 and endogenously HA-tagged full length MCA2 does the same to a similar extent, there is in our opinion little doubt that MCA2 is found at the K13 compartment and that this is similar with both constructs. If there are minor differences, these might as well occur if MCA2 is episomally (as suggested in the comment) instead of endogenously expressed. Given the limited insight, we therefore decided against the episomal overexpression (which due to its size of > 6000bp may also be somewhat less straight forward than it may sound).

      16) Line 191: it is stated that MCA2 confers resistance independently of the MCA domain, however in both the MCA2-TGD and MCA2Y1344STOP-GFPENDO parasites, the MCA domain is deleted, and for both parasites, there is resistance (albeit to a lower level in the MCA2Y1344STOP-GFPENDO line). Therefore, how can the authors state that the ART resistance is independent of the MCA domain? This statement should be that resistance is dependent on the loss of the MCA domain.

      We agree that this can’t be categorically excluded. However, a ~5 fold difference in ART sensitivity was observed between the parasites with MCA2 truncated at amino acid 57 compared to those with MCA at amino acid 1344 even though both do not contain the MCA2 domain. Hence, at least this difference is not dependent on the MCA2 domain. The larger construct missing the MCA domain shows only a very moderate reduction in RSA survival, again suggesting the MCA domain is not the main factor. We amended our statement in an attempt to more accurately reflect the data (line 487): This considerable reduction in ART susceptibility in the parasites with the truncation at MCA2 position 57 compared to the parasites still expressing 1344 amino acids of MCA2, despite both versions of the protein lacking the MCA domain, indicates that the influence on ART resistance is not, or only partially due to the MCA domain.” We would be hesitant to state the reviewer's conclusion that “resistance is dependent on the loss of the MCA domain”, as the larger construct missing the MCA2 domain has a milder RSA effect compared to MCA2-TGD, which suggests the reduction in ART susceptibility is independent of the MCA domain. These considerations also agree with the fact that the parasites with the longer MCA2 version (in contrast to the MCA2-TGD) do not have any detectable growth defect which indicates that the protein can fulfil its function without the MCA2 domain.

      17) Line 192: Why did the authors not check if MCA2 is involved in endocytosis? They state later on in the manuscript that they did not do endocytosis assays with TGD lines, however if the authors include the correct controls, this could be easily done. It would also be really interesting to see whether endocytosis gets progressively worse going from WT to MCA2Y1344STOP to MAC2TGD. This experiment (as well as doing endocytosis assays for KIC4 and KIC5 TGD lines) would drastically increase the impact of this study. These experiments would not take more than 3 weeks to perform, and would not require the generation of new lines.

      So far were very hesitant to do bloated FV assays with TGDs (even though TGDs were available for the genes encoding MCA2 and KIC4 and KIC5). The reason for this was:

      1. the fact that these proteins could be disrupted indicated either redundancy or only a partial effect on endocytosis which might lead to only small effects that likely are difficult to pick up in an assay scoring for the rather absolute phenotype of bloated vs non-bloated. Using the refined assay measuring FV size could partly amend this but we note that also FV without hemoglobin have a certain size, reducing the relative effect if there are smaller differences.
      2. a TGD line does not permit tightly controlled inactivation of the target which makes comparing the outcome of bloated food vacuole assays difficult if there are smaller growth and stage differences to the 3D7 control.
      3. in contrast to conditional inactivation parasites, the TGD lines had ample times to adapt to loss of the target protein (compensatory mechanisms are well known for endocytosis, for instance in clathrin mediated endocytosis loss of individual components can be compensated (Chen and Schmid, 2020)). We nevertheless see the reviewer's point that this should at least be attempted and now conducted these assays (see also major point 40). For MCA2 (as requested in this point), the data is shown in Figure S5C-E. This assay showed that in MCA2-TGD, MCA2Y1344STOP-GFPendo (similar to the 3D7 control) >95% of parasites developed bloated food vacuoles. Additionally, we also measured the parasite and food vacuole size of individual cells in an attempt to solve some of the problems with TGDs with such assays. In order to specifically solve problem 2 mentioned above, we analysed the food vacuoles of similarly sized parasites, however, they were non-distinguishable between the three lines. Of note, in agreement with the reduced parasite proliferation rate (Birnbaum et al., 2020) a general effect on parasite and food vacuole size was observed for MCA2-TGD parasites, indicating reduced development speed in these parasites. Hence, it is possible that a potential endocytosis reduction was accompanied by a slowed growth, and the comparison of similarly sized parasites may have obscured the effect. It is therefore not sure if there indeed is no endocytosis phenotype, although we can exclude a strong effect in trophozoites.

      Based on the RSA results at least rings can be expected to have a reduced endocytosis in the MCA2-TGD. Apart from options 1-3 mentioned above, it is therefore possible there is an effect restricted to rings, although in that case the reduced growth in trophozoites would be due to other functions of MCA2. Overall, we can conclude that the MCA2-TGD parasites do not have a strongly reduced endocytosis, but given the fact that the parasites are viable, this is not surprising. Whether the MCA2-TGD has no effect at all on endocytosis we would be very hesitant to postulate based on these results.

      18) The authors should consider re-organising the MCA2 section, first showing that the 3xHA tagged line colocalises with K13, then performing the new truncation.

      We attempted to re-organise as suggested but because we now included additional fluorescence microscopy images of schizont and merozoites (in response to reviewer 2 major comment 3) the main figure would become even larger. To prevent this, we kept the 3xHA data in the supplement.

      19) Line 197: Once again ref 43 is not correct to illustrate that actin/myosin is involved in endocytosis

      We thank the reviewer for pointing this out – we removed Ref 43.

      20) Line 202: the authors state that MyoF localises near the food vacuole from ring stage/trophs onwards. However, how can this statement be made in schizonts based on these images (Fig. 2A), where it doesn't look like MyoF is anywhere near the FV? This statement can only be made for schizonts if co-localised with a FV marker (which is done in Fig. 2B), however, based on the number of MyoF foci, it appears that this was not done for schizonts. Please either remove the statement that MyoF is near the food vacuole from trophs onwards (because it is only seen near the FV up until trophs) or show the data in Fig. 2B of schizonts to substantiate these claims.

      This is a valid point. We originally did not focus on schizonts because most markers end up in some focal area in the forming merozoite but other proteins (such as e.g. K13) also have one or more additional foci at the FV, making interpretation unclear, particularly if the schizont is still organizing to become fully segmented. This is why we generally focused the K13 co-localisations on the trophozoite stage to obtain the clearest information on endocytosis. However, given the fact that this manuscript gives the first localization of MyoF in P. falciparum parasites, we now provide a comprehensive time course (Figure 1C, S1A) including schizonts, which show quite a complex pattern: while the MyoF-GFP localization in trophozoites appeared as multiple foci close to K13 and also the FV, the MyoF-GFP pattern changes in late schizonts (fully segmented) and merozoites, appearing as elongated foci no longer close to K13 or the FV. Of note, this pattern has been previously reported for MyoE in P. berghei (Wall et al., 2019).

      We therefore revised the statement about MyoF localization in schizont to better reflect the observed localization: (line 175): In late schizonts and merozoite the MyoF-GFP signal was not associated with K13, but showed elongated GFP foci (Figure 1C, S2A) reminiscent of the MyoE signal previously reported in P. berghei schizonts (Wall et al., 2019).”

      21) Line 204-206: what does this statement bring to the paper? Is it to show that it is the real localisation of MyoF because 2 tag cell line show the same localisation? I don't think this is needed, especially as later in the manuscript an HA-tag MyoF line is used and show similar localisation.

      We see the reviewers point, but prefer to keep this data included in the supplement, particularly because potential differences in the location of tagged MyoF were a major concern.

      Related to the tag issue: in order to get a better understanding of the effect of C-terminally tagging with different sized tags we now performed a more detailed analysis of the MyoF-3xHA cell line (Figure S2F-G), showing that this cell line shows a growth rate similar to the 3D7 wild type parasites, and has less vesicles than the 2x-FKBP-GFP-2xFKBP cell line, but still slightly, but significantly more than 3D7 parasites. Overall, this indicates that the smaller 3xHA tag has less effect on the parasite, than the larger 2x-FKBP-GFP-2xFKBP tag (see also new Figure 1L, showing a correlation of level of inactivation and the endocytosis phenotype for MyoF).

      22) Line 212: The overlap of K13 with MyoF in Figure 2C 3rd panel (1st trophozoite panel) is not obvious, especially as the MyoF signal seems inexistant. I would advise the authors to replace with a better image. Also, why are there no images of schizonts shown in Figure 2C?

      As suggested we exchanged the trophozoite image of panel Figure 2 C (now Figure 1C) and expanded this panel with images covering the complete asexual development cycle including schizonts in response to this and the previous points. As indicated above (point 20), schizont stages are complex to interpret. While late schizonts likely are not very relevant for endocytosis this is the first description of the location of the protein in this parasite and we therefore now provide a more thorough representation of the MyoF location across asexual stages in Figure1C and S2A.

      23) Line 217: the spatial association of MyoF with K13 is very different when it is tagged with GFP and when it is tagged with 3xHA. The way the authors word it here, it seems that there is agreement with the two datasets, when this is not in fact the case (59% overlap for MyoF-GFP and only 16% overlap with MyoF-3xHA). These data suggest that the GFP and the multiple FKBP tags are doing something to the protein and therefore maybe the ensuing results using this line should not be trusted or be taken with a pinch of salt.

      We agree with the reviewer that the location of this MyoF-GFP in the cell might differ due to the partial inactivation but in contrast to this comment, the data does not indicate any large differences. It seems the reviewer mixed something up (the 59% mentioned might come from the MCA2 figure?). The data with the two lines with differently tagged MyoF co-localised with K13 are actually quite comparable: GFP-tagged vs HA-tagged MyoF overlapping with K13 was 8% vs 16% full overlap, 12% vs 19% partially overlapping foci, 36% vs 63% foci that were touching but not overlapping (compare what now is Figure 1D and Figure S2C). Only in the 'no overlap' there is a much smaller proportion in the HA-tagged line. However, given that these are IFAs which on the one hand are more sensitive to see small protein pools but on the other hand also have pitfalls due to fixing of the cells (e.g. tiny increase in focus size due to fixing could increase the number of touching foci that in live cells might be close but did not touch), some variation can be expected to the live cells. We agree though that the partly reduced functionality of MyoF might be the reason for the consistent tendency of a lower overlap even though the difference is much less than indicated in the comment. We added "with a tendency for higher overlap with K13 which might be due to the partial inactivation of the GFP-tagged MyoF" to the sentence "IFA confirmed the focal localisation of MyoF and its spatial association with mCherry-K13 foci"

      While we expect the fact that the difference between these parasites is only small somewhat reduces the "pinch of salt" with the MyoF line, we do agree that the partial functional inactivation of the GFP-tagged MyoF line may have some impact. However, we do not think that this means the results with the MyoF-GFP line are untrustworthy. On the contrary, it provides insights into its function that in some ways is equivalent to a knock down or TGD. Overall all the MyoF lines show: few vesicles occur in the MyoF-HA-line, more in the MyoF-GFP line and even more after knock sideways of MyoF-GFP. Importantly the severity of this phenotype correlates with the growth rates in these lines. Hence, together with the bloated food vacuole assays, this provides consistent data indicating that MyoF has a role in the transport of HCC to the FV and its level of activity correlates with the number of vesicles and growth. To better highlight this, it is now summarised in Figure 1M.

      24) Line 219: the authors state here that they could not detect MyoF-GFP in rings, when in Figure 2C they show MyoF-GFP in rings, and also show that they could detect MyoF in Sup Fig. 3B with the 3xHA tagged line. Is this a labelling mistake in Figure 2C? If the authors could indeed not see MoyF-GFP in rings, this statement should have been made when Figure 2A was presented, and not so late in the manuscript, which causes confusion.

      We thank the reviewer for pointing this out. We now provide a detailed time course (see also previous points) which shows that there is no detectable MyoF-GFP signal during ring stage development until the stage where the parasites starts the transition to trophozoites (i.e. MyoF-GFP signal could only be observed in parasites already containing hemozoin). In addition to the extended time course in Figure 1C (previously 2C) we included a panel of example ring stage images below to further highlight this. We also changed the labelling of the parasite with MyoF-GFP signal the reviewer mentions in Figure 1C to “late ring stage” (it already contains hemozoin) to clarify this.

      The description of Figure 1A is now changed to: (line 153) *“The tagged MyoF was detectable as foci close to the food vacuole from the stage parasites turned from late rings to young trophozoite stage onwards, while in schizonts multiple MyoF foci were visible (Figure 1A, S2A).” *

      Please see our answer to major comment #45 where we provide an explanation for the difference between MyoF-3xHA and MyoF-GFP signal in ring stage parasites.

      [Figure MyoF]

      25) Line 237: Showing a DNA marker (DAPI, Hoecht) for Figure 2E, and subsequent figures using mislocalisation to the nucleus, would help the reader assess efficiency of the mislocalisation.

      Please see response to major comment #64 for a detailed answer on why we did not include DNA staining in the imaging used to assess mislocalization upon knock-sideways.

      26) Line 254-256: authors should show the results of the bloating assay for parental 3D7 parasites (+ and - rapalog) to see whether the MyoF line - rapalog has increased baseline bloating. This applies to all subsequent FV bloating assays.

      We did do several controls for bloated assays (including +/- rapalog of an irrelevant knock sideways line as well as using a chemical insult for which the control was 3D7 without treatment) in previous work (Birnbaum et al., 2020), which indicated that there is no effect of rapalog to reduce bloating. Although these controls are more stringent, we nevertheless did a 3D7 +/- rapalog control and added this to the manuscript (Figure S2I). As it is not possible to do this side by side with the assays that are already in the manuscript and the +/- rapalog 3D7 cells consistently showed no or very low numbers of cells without bloating (and stringent controls in the past equally did not show an effect), we believe adding this control once suffices.

      27) Line 254-257: The authors say that because fewer parasites show a bloated food vacuole upon inactivation of MyoF it means that less hemoglobin reached the food vacuole. I understand the authors statement, however, shouldn't they look at the size of the food vacuole, instead of the number of parasites with bloated FV, to make such a statement? This has been done for KIC12 so why not doing it for MyoF?

      This was now done and is provided as Figure 1J-K, S2J. The results confirm the assessment scoring bloated vs non-boated food vacuoles.

      28) Line 259-261: these results would be difficult to interpret namely because the authors have dying parasites, which is exacerbated with the protein being knocked sideways. The authors should mention the pitfalls their knock sideways and tagging design here. Line 260-261: RSA is an assay relying on measuring parasite growth 1 cycle after a challenge with ART for 6 hours.

      Fortunately, this concern is unfounded, as the survival (measured by parasitemia after one cycle) of the same sample + and - DHA is assessed, isolating the DHA effect independent of potential growth defects which are cancelled out. Hence, if there were parasites dying in the MyoF line (please note that they might not actually die, but simply grow more slowly), this factor applies for both the + and - ART condition. As we are testing for a decreased susceptibility to ART which would manifest as an increased survival in RSA surfacing above 1%, antagonistic effects of reduced MyoF function and ART treatment would not result in detectable differences as without effect, the RSA survival is always close to zero.

      The same applies for the knock sideways where we assess the survival of +rapalog between +ART and -ART. If the reduced MyoF activity of the knock sideways leads to a decreased survival, this applies to both +ART and -ART. Please also note that rapalog was lifted after the DHA pulse (see e.g. Figure S2K).

      That effects on growth are cancelled out is nicely illustrated for proteins where there is a stronger and more rapid effect on growth upon their conditional inactivation. For instance when KIC7 is knocked aside, there is a considerable increased of RSA survival, even though continued inactivation of KIC7 would have a severe growth defect (Birnbaum et al., 2020). Vice versa, a growth defect alone does not result in reduced RSA susceptibility as evident from knock sideways of an unrelated protein or using a chemical insult (Figure 4H in (Birnbaum et al., 2020) or simply slowing the ring stage by e.g. reducing EXP1 levels (Mesén-Ramírez et al., 2019). Hence, a growth reduction is not expected to alter the RSA outcome. And even if it did, it would only lead to an underestimation of the readout if growth is too severely affected (which would be obvious in the + rapalog without DHA sample, which was not the case).

      In that respect it is valuable to have the rapid kinetics of knock sideways which permit inactivation of a protein before severe growth defects occur (although the only partial responsiveness of MyoF clearly is not the most optimal). In contrast, the absolute loss of a gene (as is the case if diCre is used) prevents (or at least makes it extremely difficult as the timing would need to exactly hit sufficient protein reduction without killing the parasite until the end of the RSA) using this system in these experiments (again see (Mesén-Ramírez et al., 2021) where in a EXP1 diCre based knock out RSA was only possible because we complemented with a lowly, episomally expressed EXP1 copy to have parasites with only a partial phenotype to do this assay).

      29) Line 261-263: the authors sate that MyoF has a function in endocytosis but at a different step compared to K13 compartment proteins. I am not sure what they mean here. Can this be clarified?

      The different steps in endocytosis are explained in the introduction and we now tried to further clarify this (line 98). So far VPS45 (Jonscher et al., 2019), Rbsn5 (Sabitzki et al., 2023), Rab5b (Sabitzki et al., 2023), the phosphoinositide-binding protein PX1 (Mukherjee et al., 2022), the host enzyme peroxiredoxin 6 (Wagner et al., 2022) and K13 and some of its compartment proteins (Eps15, AP2µ, KIC7, UBP1) (Birnbaum et al., 2020) have been reported to act at different steps in the endocytic uptake pathway of hemoglobin. While inactivation of VPS45, Rbsn5, Rab5b, PX1 or actin resulted in an accumulation of hemoglobin filled vesicles (Lazarus et al., 2008; Jonscher et al., 2019; Mukherjee et al., 2022; Sabitzki et al., 2023), indicative of a block during endosomal transport (late steps in endocytosis), no such vesicles were observed upon inactivation of K13 and its compartment proteins (Birnbaum et al., 2020), suggesting a role of these proteins during initiation of endocytosis (early steps in endocytosis).

      VPS45 has not apparent spatial connection to the K13 compartment but the fact that MyoF does - and its inactivation also results in vesicle accumulation - indicates that it is downstream of vesicle initiation, providing the first connection from the initiation phase to the transport phase. More evidence for these different steps of endocytosis has been published in a recent preprint from our lab, where we simultaneously inactivated a protein of both “endocytosis steps” (Sabitzki et al., 2023).

      To clarify this in the results as requested, we changed the statement to: (line 256) Overall, our results indicate a close association of MyoF foci with the K13 compartment and a role of MyoF in endocytosis albeit not in rings and at a step in the endocytosis pathway when hemoglobin-filled vesicles had already formed and hence is subsequent to the function of the other so far known K13 compartment proteins.”

      30) Do the authors mean that it is involved in endocytosis but not in ART resistance? If so, this is a very difficult statement to make since the parasites are dying. Is there any evidence of point mutations in MyoF in the field?

      We split this point to address all issues raised here. Please see response to point 29 which clarifies that this was meant in a different way and our response to point 28 which explains why the dying parasite issue is not expected to affect the RSA (please also note that we do not have evidence of actually dying parasites in the MyoF-2xFKBP-GFP-2xFKBP line, most likely the growth is slowed).

      The mutation issue is interesting. In fact evidence exists that MyoF mutations may be associated with resistance (Cerqueira et al., 2017) (please note that there it is still called MyoC) but in a recent preprint from our lab we did not find any evidence for a significantly changed RSA survival in 12 tested mutations in the corresponding gene (Behrens et al., 2023).

      To clarify this we added the following statement to the discussion (line 709): "Of note, mutations in myoF have previously been found to be associated with reduced ART susceptibility (Cerqueira et al., 2017), but 12 mutations tested in the laboratory strain 3D7 did not result in increased RSA survival (Behrens et al., 2023)*. *

      31) Line 298: the authors state that there is no growth defect in the first cycle when rapalog is added to the KIC11 line, however based on Figure 3D, there is evidently a 25% reduction in growth compared to - rapalog at day 1 post treatment, and a 60% reduction by day 2, which is still within the 1st growth cycle. The authors should either revise their statement or provide an explanation for these findings. The authors should also explain why their Giemsa data in Fig. 3E is not in accordance with their FACS data.

      We think there is a misunderstanding here, as our figure legend was not detailed enough and we apologise if this had been misleading. The growth effect is restricted to invasion or possibly the first hours of ring stage development (see point 4&5, reviewer 2), which in asynchronous cultures more rapidly takes effect as the culture also contains schizonts that immediately generate cells that re-invade but can't due to inactivation of KIC11 (due to the rapid action of the knock sideways, KIC11 is already inactivated). In contrast, in highly synchronous cultures, this effect can only be evident once the parasites reached the schizont stage (starting with rings this takes close to 2 days). We now clarify that Figure 2E (previously Figure 3D) shows growth data obtained with an asynchronous parasite culture, while in Figure 2F the growth assay is performed with tightly synchronized (4h window) parasites as stated in the Figure legend.

      We now explicitly state in each Figure legend and for each growth experiment throughout the manuscript whether we used asynchronous or synchronized parasites for growth assays.

      Related to this, the incorrect y-axis label of what is now Figure 2E mentioned in major comment #58 is now corrected.

      32) Line 301: KIC11 could also be important very early for establishment of the ring stage for example for establishment of the PV. Also, was mislocalisation assessed in rapalog-treated parasites at 72 hours or in cycle 3?

      This is a valid point and this has now been addressed. We performed an invasion/egress assay revealing similar schizont rupture rates, but significantly reduced numbers of newly formed ring stage parasites (Figure 2H, S3G), indicating an effect of KIC11 inactivation either on invasion or possibly the first hours of ring stage development. A very similar point was raised by Reviewer 2, please see reviewer 2; major comment #4. This is now also reflected in line 302, which now reads: ”… indicating an invasion defect or an effect on parasite viability in merozoites or early rings but no effect on other parasite stages (Figure 2F-H, Figure S3F-G).”

      We further included an assessment of mislocalization 80 hours after the induction of knock-sideways by addition of rapalog in Figure S3E which showed mislocalization of KIC11 to the nucleus.

      33) Line 311: the authors should change the sentence from 'not related to endocytosis' to 'not related to endocytosis or ART resistance'.

      Done as suggested.

      34) Line 323-325: Authors say that a nuclear GFP signal can be observed in early schizonts for KIC12. According to the pictures provided in Figure 4A and Figure S5A it is not very obvious. Also faint cytoplasmic GFP signal could only be background as we can see that exposure is higher for schizont pictures

      We changed the sentence (line 339) to: “…nuclear signal and a faint uniform cytoplasmic GFP signal was detected in late trophozoites and early schizonts and these signals were absent in later schizonts and merozoites (Figure 3A, Figure S4A,B).” in order to emphasize that the nuclear signal disappears early during schizont development.

      35) Line 326-328: The authors say that kic12 transcriptional profile indicate mRNA levels peak (no s at peak) in merozoites. Should they show live cell imaging of merozoites then? Because from the Figure 4A schizont pictures where schizonts are almost fully segmented no signal can be observed.

      The observation that mRNA levels of early ring stage expressed proteins tend to increase already in mature schizonts and merozoites is well established (e.g. (Bozdech et al., 2003)). A very good example for this are exported proteins of which most show a transcription peak in schizonts but the proteins are only detected in rings see e.g. (Marti et al., 2004). Hence, our observation for KIC12 is quite typical.

      We originally did not include merozoites, as in the last row of Figure 3B fully developed merozoites within a schizont with already ruptured PVM are shown and no GFP signal can be detected in these parasites. We now provide images of free merozoites in Figure S4A-B showing again no detectable GFP signal.

      We thank the reviewer for pointing out the typo, "peak" has been corrected.

      36) Line 347: The authors state that using the Lyn mislocaliser the nuclear pool of KIC12 is inactivated by mislocalisation to the PPM. This tends to suggest that only the nuclear pool of KIC12 is mislocalised. How is it possible that only the nuclear pool is mislocalised?

      The Lyn mislocaliser is at the PPM which is continuous with the cytostomal neck where the K13 compartment likely is found. The effect of the Lyn mislocalizer on the KIC12 protein pool localizing at the K13 compartment is therefore somewhat unclear. For this reason we already had the following statement in the original submission (line 400): “Foci were still detected in the parasite periphery and it is unclear whether these remained with the K13 compartment or were also in some way affected by the Lyn-mislocaliser.” We would like to stress here that the same does not apply to the nuclear mislocaliser, which is only a trafficking signal delivering KIC12 to the nucleus and hence likely does not affect the nuclear pool of KIC12, only the K13 compartment pool (the main interest of this manuscript).

      We realised that the statement towards the end of this paragraph was unnecessarily ambiguous in regards to the K13 compartment pool of KIC12 which might have caused some confusion about the function of this pool of KIC12 and therefore modified it to (line 374): "Due to the possible influence on the K13 compartment located foci of KIC12 with the Lyn mislocaliser, a clear interpretation in regard to the functional importance of the nuclear pool of KIC12 other than that it confirms the importance of this protein for asexual blood stages is not possible. In contrast, the results with the nuclear mislocaliser indicate that the K13 located pool of KIC12 is important for efficient parasite growth.". It is also important to note that this limitation does not apply to the NLS knock sideways in regard to the K13 compartment and that the endocytosis function of this pool of KIC12 seems solid which with this statement is enforced.

      37) Line 368-369: Effect was also only partial for MyoF. Why didn't you measure the same metrics for MyoF?

      This was now done and is provided as Figure 1J-K, S2J, confirming our previous interpretation, see also point #27 which raises the same point.

      38) Line 379: you don't know if all proteins acting later in endocytosis will have an increased number of vesicles as a phenotype

      This is based on our current definition as stated in the introduction. It assumes a directional vesicular transport of hemoglobin to the food vacuole where inhibition of early stages will prevent transport before HCC-filled autonomous vesicular containers have formed and entered the cell. In contrast later inhibition stops such containers from further transport, leading to their accumulation. Such an accumulation is visible after VPS45-inactivation and other proteins (Jonscher et al., 2019; Mukherjee et al., 2022; Sabitzki et al., 2023) or treatment with cytochalasin D (Lazarus et al., 2008). While it is possible that there may be smaller intermediates formed at the K13 compartment that later on unite or fuse with the compartment evident after VPS45 inactivation and these might be missed due to small size (i.e. inhibition of a step between K13 compartment and an early endosome or equivalent), this would still be upstream of the VPS45 induced containers and hence would be earlier. We therefore believe that based on the framework given in the introduction (see also (Spielmann et al., 2020)) to assume that a phenotype manifesting as reduced food vacuole bloating without formation of detectable vesicles likely signifies inhibition of the process early whereas reduced bloating but with vesicles signifies inhibition later in the process.

      39) Line 413-414: The authors state that no growth defect was observed upon KS of 1365800. Is growth alone enough to say that there is no impact on endocytosis?

      This is an interesting point. The endocytosis proteins we studied so far indicate that efficient impairment of endocytosis manifests as a severe growth defect. Hence, lack of a growth defect can be assumed to be an indicator for absence of an important role for endocytosis (or any other growth relevant process). Clearly there is a gradual response, such as seen in the different MyoF versions resulting in proportional growth and vesicle appearance phenotypes. Hence, a protein with a minor role might have slipped our attention but then it probably is also not a very important protein in endocytosis.

      To further strengthen our assessment of PF3D7_1365800 importance for asexual blood stage development, we now also generated a cell line expressing the PPM Mislocalizer, enabling knock sideways to the PPM. This was done because this protein consistently has a focus at the nucleus that may be within the nucleus. Again this revealed no growth defect upon inactivation (Figure S7D).

      40) Line 432: in this section, the authors state that KIC4 and KIC5 seem to have domains that may suggest these proteins are involved in endocytosis, based on the alpha fold data that is publicly available. Considering the authors have TGD-SLI versions of these lines (Birnbaum et al. 2020) and have already confirmed in this previous publication that they confer resistance to ART; it would make sense to look at endocytosis for these genes. This would be a relatively simple and straightforward experiment, taking no longer than two to three weeks, and would require no additional reagents or line generation. Doing these experiments would add a lot more weight to this final section. The authors later state that KIC4 and 5 are TGD lines, so not the best for endocytosis assays. It is unclear why this would be difficult to do if an adequate control is contained in the experiment (such as parental 3D7). It explains why they did not perform the MCA2 endocytosis assays further up, but in my opinion, an attempt at doing these assays is important and would significantly increase the impact of this paper. Identical as major comment #17.

      As stated in the manuscript and above, we were originally hesitant to do these assays due to the fact that we can't induce inactivation which is less ideal than comparing the identical parasite population split into plus and minus and is further complicated by the likely smaller effect as the TGDs still permitted growth. However, we see the point of the reviewer and now performed these assays using 3D7 as controls and taking extra care to account for stage differences between the TGD lines and 3D7. However, there was no significant difference in the bloated food vacuole assays with these cell lines. Due to the reasons mentioned in major point 17, we are not sure this indeed means these proteins have no role in endocytosis. One possible reason why we were able to obtain these TGDs may have been because the effect on endocytosis is less than in the essential proteins (or is ring stage specific) and in a TGD an endocytosis defect may therefore not be detectable with our assays (see details and further possible explanations in response to point 17).

      In an attempt to address the TGD issue, we generated knock sideways cell lines for KIC4 and KIC5. Unfortunately, the mislocalization of KIC5 to the nucleus was inefficient (see figure below). As this did not result in a growth defect (in contrast to the clear KIC5-TGD growth defect (Birnbaum et al., 2020)), this line is not suitable to study a potential role of this protein in endocytosis. Therefore, we performed the bloated food vacuole assay only with KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser parasites. However, this revealed no effect on HHC uptake, which is in line with the normal growth of KIC4-TGD parasites (Birnbaum et al., 2020) and suggests that this protein could only have a minor or redundant role in endocytosis (it is the line that shows the smallest effect in RSA). As the KIC4 and KIC5 knock sideway lines did not permit any conclusions, we did not include them into the revised manuscript but they can be found here:

      [Figure KIC4 knock sideways & KIC5 knocksideways]

      Figure legend: (A) Live-cell microscopy of knock sideways (+ rapalog) and control (without rapalog) KIC4-2xFKBP-GFP-2xFKBPendo+ 1xNLS mislocaliser parasites 4 and 20 hours after the induction of knock-sideways by addition of rapalog. Scale bar, 5 µm. Relative growth of asynchronous KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser plus rapalog compared with control parasites over five days. Three independent experiments were performed. Growth of knock sideways (+ rapalog) compared to control (without rapalog) KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser (blue) or KIC5-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser (red) parasites over five days. Mean relative parasitemia ± SD is shown. (B) Live-cell microscopy of knock sideways (+ rapalog) and control (without rapalog) KIC5-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser parasites 4 and 20 hours after the induction of knock-sideways by addition of rapalog. Scale bar, 5 µm. Growth of asynchronous KIC5-2xFKBP-GFP-2xFKBPendo+ 1xNLSmislocaliser plus rapalog compared with control parasites over five days. Four independent experiments were performed. __(C) __Bloated food vacuole assay with KIC4-2xFKBP-GFP-2xFKBPendo+1xNLSmislocaliser parasites 8 hours after inactivation of KIC4 (+rapalog). Cells were categorized as with ‘bloated FV’ or ‘non-bloated FV’ and percentage of cells with bloated FV is displayed; n = 3 independent experiments with each n=19-30 (mean 21.4) parasites analysed per condition. Representative DIC are displayed. Area of the FV, area of the parasite and area of FV divided by area of the corresponding parasites were determined. Mean of each independent experiment indicated by coloured symbols, individual datapoints by grey dots. Data presented according to SuperPlot guidelines (Lord et al., 2020); Error bars represent mean ± SD. P-value determined by paired t-test. Area of FV of individual cells plotted versus the area of the corresponding parasite. Line represents linear regression with error indicated by dashed line.

      41) Line 490-493: the authors state that the K13 compartment proteins fall in two groups, some that are involved in ART resistance AND endocytosis, and some that have different functions. However, in this manuscript the authors have demonstrated 3 flavours that K13 compartment proteins can come in: • Some that confer ART resistance and are involved in HCCU (MCA2) • Some that are involved in HCCU but not ART resistance (MyoF & KIC12) • Some that are involved in neither (KIC11) The authors should therefore revise this statement.

      We agree that this was not well phrased. To account for the fact that not all endocytosis proteins confer increased RSA survival to the parasites when inactivated we changed this statement (line 604): "This analysis suggests that proteins detected at the K13 compartment can be classified into at least two groups of which one comprises proteins involved in endocytosis or in vitro ART resistance whereas the other group might have different functions yet to be discovered.

      Generally, we believe that endocytosis is the overarching criterion and we therefore would like to keep the definitions of the main groups (endocytosis or not). As indicated by the title, the focus of the manuscript is on the K13 compartment for which so far endocytosis is the only experimentally associated function. That this group contains proteins that do not confer reduced ART susceptibility when conditionally inactivated (KIC12 and MyoF) is explained by their stage-specificity, making this a subgroup of the overarching endocytosis group.

      We realise that with the endocytosis data on the KIC4, KIC5 and MCA2 TGD there is now also a subgroup we were unable to demonstrate an endocytosis effect in trophozoites although they show changes in RSA survival. However, as indicated above, we would be hesitant to fully exclude some role of these proteins in endocytosis in rings. Particularly as a comparably small reduction in endocytosis protein activity or abundance is sufficient to increase RSA survival (Behrens et al., 2023). A principal classification of "endocytosis or ART resistance" or "neither endocytosis nor ART resistance" still accounts for this and therefore seems to us to be the most useful, particularly also in light of our domain identification that then can be linked with one or the other group.

      42) Line 508: the authors state that they expanded the repertoire of K13 compartments, when in fact they functionally analysed them - they did not do another BioID to identify more candidates.

      We respectfully disagree with the reviewer in this point, we did expand the repertoire of known K13 compartment proteins. Only independently experimentally validated proteins from proximity biotinylation experiments can be considered part of the K13 compartment (or any other cellular site or complex). Without validation of the location, the identified proteins can only be considered candidates. This is highlighted in this manuscript by the finding that several proteins of the list did not localize at the K13 compartment.

      43) Line 570-572: has anyone ever tested whether CytoD or JAS treatment in rings, is sufficient to mediate ART resistance? Something similar to what was done in PMID 21709259 with protease inhibitors. If not this would be a pretty interesting experiment for the authors to do that could shed more light on the MyoF data. It would take maybe 2 weeks to do and not require the generation of any new lines. This would clarify whether other Myosins other than MyoF are involved in endocytosis, as is suggested by previous publications (PMID: 17944961).

      We now included this experiment. In agreement with a lacking need of MyoF in rings and no effect on RSA survival, there was no increased survival of the parasites in RSA (neither on 3D7 nor on K13 C580Y parasites) after cytD treatment (new part in Figure 1M). We thank the reviewer for pointing out that this experiment might also inform on whether other myosins influence endocytosis in ring stages. We added (line 250): Similarly, also incubation with the actin destabilising agent Cytochalasin D (Casella et al., 1981), had no effect on RSA survival in 3D7 or K13C580Y (Birnbaum et al., 2020) parasites, indicating an actin/myosin independent endocytosis pathway in ring stage parasites (Figure 1M) and speaking against other myosins taking over the MyoF endocytosis function in rings.”

      44) Line 608: inhibitors targeting the metacaspase domain of MCA2 may inadvertently inactivate other essential parts of the protein. They authors should acknowledge this possibility in the text.

      The inhibitors used in the cited studies (Kumari et al., 2018) are validated metacaspase inhibitors, such as Z-FA-FMK (Lopez-Hernandez et al., 2003). Activity against the other parts of PfMCA2 - which apart from the MCA domain shows no homology to other proteins - is therefore unlikely.

      45) Line 624-625: the authors state that MyoF is 'lowly expressed in rings' - indeed this is the case in their MyoF-2xFKBP-GFP-2xFKBP line which the authors established has defects due to the tag, but it appears from their MyoF-3xHA tagged line that it is expressed in rings. The authors should therefore revise their statement, and be careful of making claims based on their defective line and using fluorescence imaging as their only metric. If they do want to make the statement that it is not there in rings, they should also do a western blot, which is much more sensitive since it amplifies the signal compared to an image of one parasite.

      This comment is related to major point #24. We also would like to stress that while the MyoF-GFP line already shows a phenotype, the impression of defectiveness based on its location is due to a mix up (see major point #23).

      We now provide a comprehensive time course of the MyoF-GFP signal (Figure 1C, S2A) showing that there is no detectable MyoF-GFP signal until the transition from ring to trophozoite stage. As this is all under the endogenous promoter, we do not think the partial functional inactivation of the tagging is the reason for the absence of the signal. If anything, we would have expected adding a stably folded structure such as GFP to increase the stability of the protein. The main reason for the discrepancy of MyoF signal in rings between the GFP-tagged line (of note there is also no detectable MyoF-GFP signal in MyoF-2xFKBP-GFP ring stage parasites (Figure S2B)) and the HA-tagged line likely is that IFA is much more sensitive than live GFP detection (similar to the high sensitivity the reviewer mentions in regards to WB). This discrepancy therefore is likely due to the fact that the lowly expressed MyoF only become apparent with the HA-tagged line due to the IFA. We therefore believe that MyoF is 'lowly expressed in rings' is an appropriate description of our results obtained with three different cell lines (MyoF-2xFKBP-GFP-2xFKBP, MyoF-2xFKBP-GFP and MyoF-3xHA). We hope this is sufficiently well reflected in the manuscript where we write ‘a low level of expression of MyoF in ring stage parasites.’ not that it is ‘not there in rings’ (line 174).

      46) Line 635: arguably this is the 3rd variety and not the 2nd (the authors already mentioned 2 types - ones that are involved in HCCU AND ART and those involved in HCCU only). See comment for line 490-493 above.

      See response for major comment #41, we now consistently used "or" instead of "and". See line 490-493 how this was resolved for what previously was line 635.

      47) Line 785: Bloated food vacuole assay/E64 hemoglobin uptake assay method specify that a concentration of 33mM E64protease inhibitor was used. However, in reference 44, cited in the manuscript, a concentration of 33µM E64 was used. Please confirmed if this is just a typo or if 1000x E64 concentration was used which renders the experiment invalid.

      We thank the reviewer for pointing this out, we corrected this typo and will look out for symbol font conversion errors for the resubmission.

      48) Line 788: it is unclear from this section what is considered a bloated food vacuole - is there an area above which the FV is considered bloated? Do the authors do these measurements manually or use an addon in FIJI/ImageJ? What is the cutoff for if a FV is bloated? Please clarify. Additionally, for the representative images + rapalog for Figures 2H and 4H, it would be useful to see where the authors delineate the FV (add a white circle showing what is actually measured).

      The bloated FV assay is well established (Jonscher et al., 2019; Birnbaum et al., 2020; Sabitzki et al., 2023). Although the bloating of the FV is a human judgment call, it is actually quite obvious: bloating appears as an easily spotted bulging of the FV in DIC. As also minor bloating is scored as 'bloated', it is a very conservative assay. Using an-add on to measure this is not straight forward. It is unclear how this bulging effect of the FV in DIC could be spotted by a software and due to the obviousness to human operators, potentially lengthy and complicated efforts to design appropriate machine learning options were not undertaken. The situation faced by the scorer of the assay is evident from Figure S4F-G which contains close to 50 "on rapalog" cells and close to 50 control cells, giving representative cells from all replicas of bloated FV assays with KIC12. Please note that these images shows the most complicated situation as far as bloated assays go, because the phenotype is not 100% (see Figure 3F) compared to e.g. KIC7 inactivation which leads to lack of bloating in almost all cells (see (Birnbaum et al., 2020) Figure 3E) but nevertheless the difference is still obvious. We are aware that in such situations (less than absolute inhibition) this assay scoring of "yes" or "no" is a surrogate for the actual level of inhibition and may be more subjective. This is why in this case we also did the FV size measurements (which are less dependent on human judgment) to further support this and give a better quantifiable measure. Of note, the bloated food vacuole judgments are done "blinded", i.e. the examiner does not know which sample they are looking at.

      In response to this reviewer's point we now also added the FV size refinement of the assay for MyoF inactivation which is one of the cases where inhibition of bloating is not in 100% of the cells (see major comment #27). Please also note here the advantage of the rapidly acting knock sideways technique for these assays which shows the sum of effect 8 h after initiating inactivation and for which we carefully control size of the cells which shows that there is no significant growth reduction over the assay time, excluding secondary effects due to a generally reduced viability. Compared to slower acting systems suggested to have been used instead (see introductory part and significance of this review), the rapid speed of knock sideways reduces the risk of potential pleiotropic or compensatory effects due to the time needed for proteins to be depleted if the gene or mRNA is targeted instead.

      The suggestion to include a ‘white circle’ (raised also as minor comment#27) is useful as an aid to see the food vacuole. However, in contrast to the Figures in (Birnbaum et al., 2020) (where we did add such a circle), we here included the DHE staining images in the figure, labelling the parasite cytosol which readily shows the FV (the FV corresponds to the region where there is no DHE staining). As this shows the position of the FV we would prefer to not obscure the DIC images with additional features to permit the reader to see the difference between bloated or non-bloated food vacuoles and keeping the image as natural as possible.

      49) Line 863-864: this sentence seems to be out of place.

      We thank the reviewer for pointing this out, the details of nucleus staining were moved to the correct part.

      50) Line 875: the authors state that there is a light blue wedge, when the circle consists of grey and black wedges. Please revise this.

      This has been corrected.

      51) Line 1059-1061: it is unclear whether the individual growth curves are different clones or whether they are just the same experiment repeated? If it is the latter, then why are they not combined, as is traditionally done?

      These are the individual replicates of the growth curves shown in Figure 1G of the same cell lines done on a different occasion. We always try to show as much of the primary data as possible and believe that showing individual data points from the different experiments is better than only the combined values which obscure the actual course of each experiment.

      52) Line 919-924: the authors mention a blue and red line, but there is only a black line in figure 3D. Moreover, the experiment of using the LYN mislocaliser was only done for KIC12 according to the manuscript. Additionally, the y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis there are several days. In the text it says there is no growth defect until the second cycle, but from this graph it appears the growth defect is evident as early as 1 day post rapalog treatment. Can the authors please clarify and correct the issues pointed out.

      We thank the reviewer for pointing this out, this was due to a copy & paste error in the figure legend that was now amended. We also fixed the incorrect axis label. For the last part (growth defect) please see detailed answer to Major comment#31 raising the same concern for KIC11 (in synchronous parasites the defect only takes effect once the cells reached the relevant stage whereas in asynchronous cultures there are always cells in the relevant stage that due to the rapid effect of the knock sideways already have a growth phenotype).

      53) Figure 1 panel B & C: the label of the figure where the signal from MCA2Y1344STOP-GFP is shown with the DAPI signal overlayed is deceptive since it suggests that this is the signal of full length MCA2. Please change the label of this panel from MAC2/DAPI to MCA2Y1344STOP/DAPI. The same is true for Panel C for the image labeled MCA2/K13 - please change this to MCA2Y1344STOP/K13.

      Done as requested.

      54) Figure 2B: what stages are these parasites? Please state this in the figure. Based on the MyoF pattern, it looks like rings in the upper panel and trophs in the bottom pannel. Why were schizonts not shown?

      Both are trophozoites (early trophozoite in top panel and late trophozoite in bottom panel). This is now labelled in what now is figure 1B. As stated above, schizont stages are less relevant for the topic of this manuscript and in order to prevent the manuscript from getting more disjointed and keeping it more focussed on the main topic, we decided to not include a schizont in the manuscript. Nevertheless, we included an example image below.

      [Figure MyoF_p40px schizont]

      55) Figure 2D&F: it is not very meaningful when growth assays are shown as a final bar after 4 days of growth. It is much more useful and informative to see a growth curve instead (as is shown in the supplementary), since it shows if the defect is apparent in the first growth cycle or later. With the way the data is currently shown, this is not apparent. I would advise the authors to switch the graph in 2F out of a combined graph of all the biological replicates growth curves for S3D - showing error bars.

      While we in principle fully agree with the reviewer in showing the course of the full experiment (which is available in Figure S2E), the key here is to show the overall difference. Hence, we would like to keep this comparison of the overall effect on growth in what now is Figure 1E and G. It is part of the argument to the doubts this reviewer raises to the function of MyoF (mainly in the overall assessment and the significance statement) to show that the phenotype is actually very consistent (partial inactivation through tagging or further inactivation using knock sideways increases endocytosis phenotypes, correlating with parasite viability).

      Please also note, that the growth curves upon knock sideways shown in Figure 1G, S2E are performed with asynchronous parasite cultures, which doesn’t allow us to draw direct conclusions about growth cycle effects.

      Nevertheless, we now also included the suggested combined data representation in Figure S2E.

      56) Figure 3: why were the calculation of FV area, parasite area and FV/parasite area only done for KIC12 and not done for MyoF? It would be interesting to see if any of these values are different for MyoF - whether the parasites are smaller in area and therefore FV smaller. Please present them Figure 2. Images should be already available and would not require further experiments to be done, only the analysis.

      This now has been done (confirming our results) and is included as Figure 1J-K, S2J. This point was also raised as major comment #37, please also see detailed answer there.

      57) Figure 3B: why is there no spatial association assessment for KIC11 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins.

      This is now included in Figure 2C.

      58) Figure 3D: The y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis the experiment takes place over several days. Is this a typo in the y axis? Additionally, the authors state in line 287-290 that the growth defect upon addition of rapalog is only seen in the second cycle, but from this graph it appears the growth defect is already evident 1 day post rapalog addition. The figure legend also does not make sense for this figure since it mentions a blue and a red line, when there is only a black line present. The legend also mentions the LYN mislocaliser which was used for KIC12 not KIC 11 (see above).

      We apologise for the inadequate legend and colour issues, this was amended. This point was also raised in major comment #31 and #52, please find detailed answer there.

      59) Figure 3E: the colour for Control and Rapalog 4 hpi are very similar and very hard to discern. Please choose an alternative colour or add a pattern to one of the samples. The y axis is also missing a label. Is this supposed to be parasitemia (%)?

      We thank the reviewer for pointing this out, the missing label is now included and the colour has been adapted to make them better distinguishable.

      60) Figure 4A: the ring shown in this figure does not appear to be a ring (it is far too large and appears to have multiple nuclei?). Do the authors have any other representative images to show instead?

      This is in fact a ring, but we realize that we accidentally included an incorrect size bar in the ring image of Figure 4A (now Figure 3A) (size bar for 63x objective instead of the correct one for the 100x objective), we apologise for this oversight. We don’t think this parasite has multiple nuclei, instead the Hoechst signal shows the often elongated nucleus seen in rings that can appear as two foci in Giemsa stained smears which leads to the typical diagnostic feature of P. falciparum rings in diagnostics. In order to exclude any doubts about the nuclear localization of KIC12 in rings, we here attached a panel with more examples of KIC12-2xFKBP-GFP-2xFKBP ring stage parasites.

      [Figure KIC12]

      61) Figure 4B: why is there no spatial association assessment for KIC12 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins. This should be done for the different life cycle stages considering the changing localisation of KIC12.

      This is now provided in Figure S4A. As suggested by the reviewer, we independently quantified the association for ring stage, early trophozoite and late trophozoites stage. As there is no KI12 signal in schizonts, we did not include a quantification for this stage.

      62) Figures 4C&E: it is extremely important to show the DNA stain in both these samples considering that a portion of KIC12 is in the nucleus! Please add the DAPI signal for these figures (as for all other figures!).

      Please see major comment #64 for a detailed answer why we did not include DNA staining in the imaging used to assess mislocalization upon knock-sideways.

      63) Figure 4E: this figure should be presented before 4D (considering the line being presented in 4E is used in an experiment in 4D). The authors should switch the order of these two.

      We see the point the reviewer is raising here, Figure 4D (now Figure 3D) also contains the data with the Lyn mislocaliser while we first talk about the NLS mislocaliser. This permits a better comparison between the two mislocaliser lines. However, first explaining the Lyn-mislocaliser and then going back to the NLS would make it rather complicated for the reader to follow the storyline and therefore we would like to keep the order as it is. We realise that this means the reader has to go back one figure part for seeing the Lyn growth data, but believe this is worth the benefit that the data is there compared to the NLS result.

      64) It is unclear why in many of the fluorescence images the authors do not show the DAPI signal - particularly when colocalising with K13 and when doing the knock sideways experiments. Please add these images to the figures - I would assume they have already been taken, so would simply involved adding the images to the panel.

      We did not include DNA staining (DAPI or Hoechst) for any of the images used to assess the efficacy of mislocalization, as we would prefer to keep the parasites as representative of a viable parasites in culture as possible. Hence they were imaged without DNA stain (these stains are toxic). We would like to point out that a DNA stain is not necessary, as the mislocaliser already marks the nucleus (in the case of the NLS mislocaliser), actually even somewhat more accurately, as it fills the entire nuclear space rather than only the DNA which is marked by DAPI or Hoechst.

      For LYN this admittedly is not the case, there the mislocaliser marks the plasma membrane. However, we think the proper control for efficient mislocalisation is the comparison between the GFP-tagged protein of interest and the mCherry mislocaliser to show mislocalisation, as previously done in our lab (e.g. (Birnbaum et al., 2017; Jonscher et al., 2019; Birnbaum et al., 2020)).

      Due to their toxicity, we also avoided nuclear staining in some other parts of the manuscript when we were of the opinion that a nucleus signal was not necessary.

      65) Throughout the manuscript, there is no western blot confirming the correct size of their modified proteins. This should be provided.

      We did perform Western blot analysis for both MCA2 cell lines. MCA2 is the only gene-product for which we generated a disruption for this work, and together with the severe truncation from previous work, we provided a Western blot-based confirmation of the correct size.

      The MCA2 disruptions are at least partially dispensable for in vitro parasite growth, hence if degradation occurred, this might not have been noticed. In that case we considered it relevant to show that the truncations were of the expected size. The other proteins in the main figures are essential for growth. Hence, if the tagging approach would lead to unexpected changes in protein integrity (which we assume is what was intended by this concern to be assessed with a Western blot), the parasites expressing the tagged MyoF, KIC11 and KIC12 would - due to their importance for asexual blood stage development - not have been obtained. Hence, we can assume the integrity of the tagged protein is very unlikely to have been affected in a functionally relevant way.

      66) None of the figures are appropriate for individuals with colour blindness, limiting their accessibility to the paper. Please change the colour schemes for all fluorescent images using magenta/green or an alternative colour combination appropriate for colourblind individuals.

      We thank the reviewer for this comment. This has now been amended, individual channels of fluorescence microscopy images are now shown in greyscale, while the overlay was changed to green/magenta.

      Minor Comments

      1) line 29: remove 'are'.

      Done.

      2) Line 29: the text says "HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins are among the few proteins so far functionally linked to this process." The sentence should be: 'HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins among the few proteins so far functionally linked to this process."

      Done.

      3) line 44: remove 'the'

      Done.

      4) Line 48: consider mentioning here that malaria is caused by the parasite Plasmodium - otherwise the first mention of parasite in line 52 is confusing for the non-specialist reader.

      Done.

      5) Line 49: estimated malaria-related death and case numbers are from the 2021 WHO World malaria report. You cite the 2020 WHO World malaria report.

      We now cite the newest WHO report.

      6) Line 53: please insert the word 'have' between now and also.

      Done.

      7) Line 54: please change 'was linked' to is linked

      Done

      8) Line 72: I would specify that free heme is toxic to the parasite. Especially as you mention that hemozoin is nontoxic.

      Sentence would be "where digestion results in the generation of free heme, toxic to the parasite, which is further converted into nontoxic hemozoin"

      Done.

      9) Line 90: authors should either say "in previous works" or "in a previous work"

      The text has been altered to say: “ in a previous work”.

      10) Line 91: "We designated these proteins as K13 interaction candidates (KICs)"

      Done.

      11) Line 95: please change 'rate' to number

      Done.

      12) Line 109: Please include a coma before (ii).

      Done.

      13) Line 112: as shown by Rudlaff et al in the paper you are citing, PPP8 is actually associated with the basal complex. You can say that "(ii) were either linked or had been shown to localise to the inner membrane complex (IMC) or the basal complex (PF3D7...).

      Done.

      14) Line 114: Protein PF3D7_1141300 is called APR1 in the manuscript but ARP1 in Supplementary Table 1. Please correct.

      Done.

      15) Line 131: please define SNP - this is the first use of the acronym.

      Done.

      16) Line 133-134: South-East Asia instead of "South Asia"

      Done.

      17) Line 135: please explain what TGD is - it is referred to over and over again in the manuscript without ever being explained.

      We apologise for this oversight. We now explain what is meant with TGD at the suggested point of the manuscript.

      18) Line 145: change 'Western blot' to western blot - only Southern blot is capitalised since it is named after an individual, while the other techniques are not.

      To the best of our knowledge this issue has not been resolved, some Journals capitalize the “W” (e.g. Science), while others don’t (e.g. Nature). We would prefer to continue to capitalize the “W”, as this is consistent with the original publication from (Burnette, 1981), but if there are strong objections, we would be happy to change this____.

      19) Line 152: add "the" between 'and spatial'

      Done.

      20) Line 158: please define SLI as selected linked integration, since it is the first use of the acronym.

      Done.

      21) Line 178: introduce a coma after protein. Sentence should be "Proliferation assays with the MCAY1344STOP-GFPendo parasites which express a larger portion of this protein, yet still lacking the MCA domain (Figure 1), indicated no growth ...

      Done.

      22) Line 195: the authors could mention that MyoF was previously called MyoC in the Birnbaum 2020 paper. I wanted to check back in the Birnbaum 2020 paper and could not find MyoF

      Good point, this was done.

      23) Line 200: "Expression and localisation of the fusion protein was analysed by fluorescent microscopy". Why expression was not analysed also by western Blot same as for MCA2?

      Please see major comment #64 for a detailed answer.

      24) Line 204: I could not find any mention of MyoF (Pf3D7_1329100) in reference 65. Please remove reference 65 if not correct. Also reference 66 looks at Plasmodium chabaudii transcriptomes so I would specify that "This expression pattern is in agreement with the transcriptional profile of its Plasmodium chabaudii orthologue"

      Reference 65 (Wichers et al., 2019) provides an RNAseq transcriptome dataset for asexual blood stage development of 3D7 (originating from the same source as the 3D7 used in this study). While Ref 66 (Subudhi et al., 2020) indeed contain transcriptomic data from P. chabaudi, the authors also provide a nice 2h window RNAseq transcriptome dataset for asexual blood stage development of Plasmodium falciparum. Both datasets are therefore suitable as reference for the statement about myoF transcription pattern. Both datasets are also easily accessible and show the pattern in a graph in PlasmoDB.

      25) Line 208: Please indicate a reference for P40 being a marker of the food vacuole

      Done.

      26) Line 220-224: The authors should consider changing to " Taken together these results show that MyoF is in foci that are mainly close to K13 and, at times, overlapping, indicating that MyoF is found in a regular close spatial association with the K13 compartment."

      The suggested wording introduces "mainly" for "frequently" and likely was in part motivated by the discrepancy in location between cell lines that we hope we now could clarify to be only minor (see major point #23). We therefore think the original wording appropriately summarises the findings (line 178): “*Taken together these results show that MyoF is in foci that are frequently close or overlapping with K13, indicating that MyoF is found in a regular close spatial association with the K13 compartment and at times overlaps with that compartment.” *

      27) Line 255: In Figure 2H, and subsequent figures showing bloated FV assay, I would delineate the food vacuole with dashed line as in Birnbaum et al. 2020 to help the reader understanding where the food vacuole is.

      In contrast to the Figures in Birnbaum et al. 2020, we here included the DHE staining (parasite cytosol) in images of bloated FV assays which visualizes the FV. We therefore decided to avoid any further marking, to keep the image as unprocessed as possible (see also major point 48).

      28) Line 265-266: Here the title says that KIC11 is a K13 compartment associated protein, but the title of Figure 3 says KIC11 is a K13 compartment protein. I noticed that you make the difference between K13 compartment protein et K13 compartment associated protein for MyoF for example which is not clearly associated with the K13 compartment. Which one is it for KIC11?

      The interpretation of the reviewer is correct, we indeed graded this subconsciously based on level of overlap. Based on the newly added quantification shown in Figure 2C, we describe KIC11 now as K13 compartment protein.

      29) Line 309-310: indicate a reference for your statement "which is in contrast to previously characterised essential K13 compartment proteins".

      Done, we now included Birnbaum et al. 2020 as reference for this.

      30) Line 377: Figure 4I, please correct 1st panel Y axis legend

      Done.

      31) Line 404: replace "dispensability" with dispensable

      Done.

      32) Line 416: can the authors provide any speculation as to why they observed these proteins as hits in the BioID experiments?

      As some of these proteins were less well or less consistently enriched, they could be background of the experiment. Alternatively, some could be proteins that only transiently interact with the K13 compartment.

      33) Line 451: Where the "97% of proteins containing these domains also contain an Adaptin_N domain and function in vesicle adaptor complexes as subunit a" come from. Do you have a reference?

      The statement now includes references and reads (with small changes to original submission): "More than 97% of proteins containing these domains also contain an Adaptin_N (IPR002553) domain (Blum et al., 2021) and in this combination typically function in vesicle adaptor complexes as subunit α (Hirst and Robinson, 1998; Traub et al., 1999) (Figure 5D) but no such domain was detectable in KIC5."

      34) Line 465-467: the same could be said for KIC4 as it also has a VHS domain.

      The critical issue is the combination of domains and their position within the protein. While KIC4 also contains a VHS domain, the VHS domain in KIC4 is N-terminal, not in a central position and it is also not the first structural domain to be identified in KIC4. The similarity to adaptin domains was already described ((Birnbaum et al., 2020) and annotated in PlasmoDB) and these domains are also involved in vesicle formation and trafficking. These aspects of the statement can therefore not be extended to KIC4. With regards to VHS domains being involved in vesicle trafficking, this is already stated in line 538: «KIC4 contained an N-terminal VHS domain (IPR002014), followed by a GAT domain (IPR004152) and an Ig-like clathrin adaptor α/β/γ adaptin appendage domain (IPR008152) (Figure 5A-C, Figure S8). This is an arrangement typical for GGAs (Golgi-localised gamma ear-containing Arf-binding proteins) which are vesicle adaptors first found to function at the trans-Golgi (Dell’Angelica et al., 2000; Hirst et al., 2000)

      35) Line 477-479: Can be rephrased to "However, we found this protein as being likely dispensable for intra-erythrocytic parasite development and no colocalisation with K13 could be demonstrated, suggesting a limited role for PF3D7_1365800 in endocytosis. Or something like that. Makes it clearer.

      We rephrased this sentence and it now reads (line 592): However, we found this protein as being likely dispensable for intra-erythrocytic parasite development and no colocalisation with K13 was observed, suggesting PF3D7_1365800 is not needed for endocytosis“.

      36) Line 535: Have AP-2a or AP-2b been shown to be at the K13 compartment?

      AP2m is at the K13 compartment (Birnbaum et al., 2020). Adaptor complexes are heterotetramers and their subunits do not typically function on their own and this is conserved across evolutionarily distant organisms. In agreement that this is also the case in P. falciparum, Henrici et al. (Henrici et al., 2020a) showed that both, AP-2a and AP-2b, were present in an AP2µ Co-IP, indicating that the AP2 complex consist of the ‘classical’ subunits in P. falciparum. Therefore, the presence of all subunits at the K13 compartment is very likely, although this has only been experimentally confirmed for AP2µ. Of note, for Toxoplasma gondii the presence of AP-2a and AP-2b at the micropore has been experimentally confirmed (Wan et al., 2023; Koreny et al., 2023) and interaction suggested by presence in the same IP as DRPC (Heredero-Bermejo et al., 2019).

      37) Line 569: reference 43 is wrong

      We thanks the reviewer for pointing this out – we removed Ref 43.

      38) Line 746: typo "ot" instead of or.

      Changed.

      39) Line 801: method for Domain Identification using AlphaFold specify that RMSDs of under 5Å over more than 60 amino acids are listed in the results. However, there is a typo in Figure 5B for KIC5 where it says "RMSD 4.0 Å over 8 aa". Please correct.

      Done. In addition, we have now applied a more stringent cut-off of 4Å over more than 60 amino acids to ensure a higher reliability of our hits. This decision was based on results from our preprint (Behrens and Spielmann, 2023). Because of this the phosphatase domain in KIC12 is no longer included in this manuscript and accordingly the following sentence has been deleted. In KIC12 we identified a potential purple acid phosphatase (PAP) domain. However, with the high RMSD of 4.9 Å, the domain might also be a divergent similar fold, such as a C2 domain, which targets proteins to membranes.”

      40) Line 856: In Figure 1E, please use the same Y axis legend as in Figure 2D "relative growth at day 4 [%] compared with 3D7"

      Done.

      41) Figure S1: Some PCR gels check for integration are presented as 5', 3' and ori whereas other gels are presented as ori, 5' and 3'. This is confusing.

      We agree that ideally the order of sample loading should be consistent and we apologise for this. The explanation for this is that these gels were run by different people at different times before we were able to better standardize the loading scheme. However, in the interest of not unnecessarily using resources for something that has a similar meaning, we would prefer not to repeat these PCRs and re-run them only for consistency reasons (as the conclusion is not affected by the different loading schemes).

      42) Figure S1: Why was the expression of only MCA2 was verified by Western blot? What about the other proteins?

      See response to major comment 56.

      43) Line 493: Considering KIC11 was not involved in HCCU or ART resistance it might be worth mentioning in this section that it is of note that there are no domains detected that would be involved in endocytosis.

      We agree that this is the case, however it is also the case for all other proteins that either are not involved in endocytosis and/or lowered susceptibility to ART. We therefore now added a summary statement addressing this in line 602: In contrast, the K13 compartment proteins where no role in ART resistance (based on RSA) or endocytosis was detected, KIC1, KIC2, KIC6, KIC8, KIC9 and KIC11, do not contain such domains (Figure 5E).” We did not add this at the suggested part of the manuscript as at that point the domain search results are not yet introduced and doing this each time for all the individual proteins would disconnect the flow of the manuscript.

      44) Line 503-506: is it wise to generate more drugs that target a pathway that is already highly susceptible to mutations? The authors should add a statement explaining how this might be avoided.

      The only protein for which mutations do not have a large fitness cost is K13 (see also our preprint on fitness cost of ubp1 mutation (Behrens et al., 2023) and even with K13 the level of resistance seems to be limited by amino acid deprivation when endocytosis is reduced (Mesén-Ramírez et al., 2021). We therefore do not think that this pathway is particularly prone for mutations. Further, the number of commercial drugs targeting the "endproduct" of endocytosis (hemoglobin digestion and detoxification of heme) highlight it as the most prominent vulnerability for drug-based intervention if we go by number of commercially available drugs acting on things associated with a single process.

      45) Throughout, scale bars are stated in the figure legends at the end of the legend. This is a slightly confusing format. The authors should consider stating the scale bar for each sub-legend where a fluorescence image is taken.

      Done.

      ** Referees cross-commenting**

      After reading reviewer 2 and 3's comments, I think there are significant overlaps in the key points raised in terms of questions about fusion proteins and their potential partial mis-localisation, better descripton of results and target selection. Overall I think we agree that the work has potential, but in its current form does not represent a major advance. It would be immensely helpful if the manuscript would be carefully edited for a better flow and linear description of results.

      We now rearranged the manuscript for better flow but would like to highlight that the many requests for smaller experimental issues (and "better description of results") worked somewhat in the opposite way of a more linear description. We hope the rearranged version acceptably balances these two issues. The issues raised in regards to target selection and potential partial mis-localisation are addressed in our responses mainly to this reviewer. Please also see comments on systems used at the end of the rebuttal.

      Reviewer #1 (Significance (Required)):

      The authors set out to test whether other proteins that are in the vicinity of K13 are involved in mediating ART resistance and endocytosis. This is an interesting question. However, other than MCA2 which was already known to be involved in mediating ART resistance (and was not tested for its involvement in endocytosis), none of their candidate proteins seem to be involved in mediating both these functions. The authors show that the other proteins tested appear important for parasite growth, with KIC12 and MyoF involved in mediating endocytosis. While these findings are novel, the KS approach used by the authors casts some doubt over the findings, and would mean that these findings would have to be re-tested with a more reliable approach, such as the GlmS system or generating a conditional knockout using the DiCre system. Despite not advancing our understanding of ART resistance, or identifying further players involved in this process, this manuscripts provides two candidates that are involved in mediating endocytosis and a further candidate that appears to be important for parasite growth. Further work on these proteins will be required to understand their exact roles. As stated above, there is currently limited interest for these results (limited to researchers working on endocytosis in apicomplexan parasites and possibly the wider endocytosis field from an evolutionary perspective), however with further work, this could increase the impact and interest of this work substantially.

      The authors do not describe any novel methods/approaches within this work.

      In the significance statement the reviewer indicates that other systems would have been more reliable for the work here. This is addressed in our response above and in a detailed considerations on the properties of conditional inactivation systems at the end of the rebuttal. The systems used in this work were not only chosen because they permit rapid targeting of many different proteins, but because they have merits that are beneficial for our assays. In fact many of the functional assays in this manuscript are difficult or impossible to carry with the suggested conditional inactivation systems (please note that we have extensive experience with the systems considered preferable:

      • DiCre (Birnbaum et al., 2017; Mesén-Ramírez et al., 2019; Mesén-Ramírez et al., 2021; Wichers et al., 2022; Kimmel et al., 2023)

      • glmS (Wichers et al., 2021c; Wichers et al., 2021a; Wichers et al., 2022; Wichers-Misterek et al., 2023)).

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In a previous publication the Spielmann lab identified the molecular mechanism of ART resistance in P. falciparum by connecting reduced levels of the protein K13 to decreased endocytosis (uptake of hemoglobin from the RBC cytosol), which results in reduced ART susceptibility. Using quantitative BioID the authors further identified proteins belonging to a K13 compartment, highlighting an unusual endocytosis mechanism.

      In the present manuscript the authors follow up on this work and closely examine ten more proteins of the K13/Eps15-related "proxiome". They successfully link MCA2 to ART resistance in vitro, while the proteins MyoF and KIC12 are involved in endocytosis but do not confer in vitro ART resistance when impaired. They further characterize one candidate (KIC11) that partially colocalizes with K13 in trophozoites but to a lesser degree in schizonts. Growth assays suggest an important function for KIC11 in late stages of the intraerythrocytic developmental cycle. Five analyzed proteins however do not colocalize with the K13 compartment, while a sixth was refractory to endogenous tagging.

      Using AlphaFold predictions of the KIC protein structures the author identify domains in most constituents of the K13 compartment, highlighting vesicle trafficking-related features that were not identified on primary sequence level before.

      The combination of functional data together with structure predictions leads them to propose a refinement of the K13 compartment as being divided into proteins participating in endocytosis and proteins that have an unknown function.

      We thank the reviewer for the assessment of the manuscript and the constructive comments.

      Major comments:

      1) -Table 1 is missing

      We apologise for this mistake; Table 1 is now included.

      2) -Lines 117-123: Given the total list of uncharacterized candidates encompasses 13 proteins, can the author gives the reason why only the top 10 and not all 13 were characterized in this study?

      A similar point has been raised by Reviewer 1 in major comment #12, please see our response there for an explanation why we chose which targets.

      3) -Line 174: 20% of observed MCA2 foci show no overlap with K13 and 21% only partially overlap, can the author confirm that the observed MCA2 foci in schizonts are the ones that co-localize with K13. (Addition of a schizont stage image in Fig 1C would be sufficient).

      We now extended Figure 4C with images of MCA2-Y1344STOP-GFP+mCherryK13 parasites covering the schizont and merozoite stage, showing that the majority of the MCA2 foci in schizonts are also mCherry-K13 positive.

      4) -The localization and observed phenotype of KIC11 is interesting but unfortunately the authors do not explore it further. Does KIC11 localize with markers of e.g. the secretory organelles (micronemes or rhoptries) in schizonts and could therefore be involved in RBC invasion?

      While we intended to focus mainly on the endocytosis aspect of these proteins, we see the reviewer's point and now generated new cell lines enabling assessment of spatial association of KIC11 with markers for rhoptry (ARO), micronemes (AMA1), and inner membrane complex (IMC1c). This revealed that the KIC11-GFP signal in schizonts does not overlap with apical organelle markers and the signal does not resemble a typical apical localization. In addition, we assessed all three organelle markers after inactivating KIC11 by knock sideways which showed that KIC11 inactivation has no apparent effect on the appearance of these markers, suggesting no major alterations in schizont morphology in respect to apical markers. These results are now presented as Figure S3A and in line 304 of the results.

      5) Can the author distinguish if KIC11 is involved in RBC invasion or in establishment of the ring-stage parasite?

      In order to look into this, we performed egress/invasion assays, quantifying schizont and ring stage parasites in tightly synchronized parasites at two different time points (pre-egress: 38-42 hpi & post-egress: 46-50 hpi). This revealed a significant decrease in newly formed ring stage parasite per ruptured schizont in parasites with inactivated KIC11, while the egress efficacy remained unaffected. This indicated an invasion or very early ring stage development defect (new Figure 2H, Figure S3G). To further determine at which point exactly the phenotype occurs (ie during invasion or early after invasion) would require extensive experimentation that goes beyond the scope of this study (e.g. invasion assays using video microscopy with a representative number of parasites or sophisticated flow based quantification assays). We hope by excluding egress and gross changes of apical organelles as well as no indication for similar number of early rings (indicating it is invasion or a very early ring-establishment phenotype) will sufficiently narrow down the phenotype for labs interested in invasion to more definitely answer this question.

      Minor comments:

      1) Table S1: Please add the criterion for the order of proteins (abundance in "proxiome"?) in the table as a separate column. I would also suggest adding a new column that highlights the 10 proteins investigated in this study as I found the color-coding slightly confusing.

      Done as suggested: we now include the “average log2 Ratio normalized Kelch13” values from the four DiQ-BioID experiments performed with K13 in (Birnbaum et al., 2020), as well as the suggested column to highlight the investigated proteins. Please also see reviewer 1 major point # 12 for additional information on the selection criteria and how this was added to the manuscript.

      2) -154-155: There is a discrepancy between the text and Fig1C regarding the % of partial overlapping and non-overlapping foci.

      We thank the reviewer for pointing this out, this was corrected.

      3) -The y-axis label is missing in Fig 3E

      Done.

      4) -Fig 4I left graph, the superscript 2 is missing in μm2

      We thank the reviewer for pointing this out, this is now changed.

      5) -Did the author colocalize KIC11 in schizonts with other proteins found in the K13 compartment group of proteins not involved in endocytosis/ART resistance? This may help to further subgroup these proteins.

      This is an interesting point but would actually be technically challenging to do. For this we would need to generate a KIC11endo parasite line for each of these KICs and then do co-localisation in schizonts. However, the outcome of this likely would not be very clear. The reason for this is as follows. There are foci of KIC11 that do overlap with K13 in schizonts. One can expect that these foci show KIC11 at the K13 compartment and that the other KICs would overlap with KIC11 in these K13 foci in schizonts. Hence, we would also need to see K13 to find the non-K13 compartment KIC11 foci and see if these contained the KIC of interest. This is technically challenging because it would mean we would need a third fluorescent protein which is not that trivial to do. Due to the difficulty to do this and the large amount of work involved and the already considerable amount of data in this manuscript, we believe this will be better suited for a different study.

      6) -As a general comment: to make the beautiful IFAs more accessible to a broader readership, I would encourage the authors to switch the color-coding to green/magenta/blue or an equivalent color system or add grayscale images.

      This was done as suggested, all fluorescence images are now provided as greyscale images and the overlays are shown in magenta/green.

      Reviewer #2 (Significance (Required)):

      Characterizing the molecular components involved in Plasmodium endocytosis will not only reveal interesting biology in these highly adapted parasites, but will more importantly lead to a better understanding and potentially open new avenues for intervention of ART resistance. The here presented manuscript is a carefully executed follow-up on previous work done in Dr. Spielmann's lab focusing on the K13 compartment. The authors use established assays to characterize novel components and reveal three new players in endocytosis with one mediating ART resistance in vitro. The proposition that parts of the K13 compartment have a function other than endocytosis is interesting, but will have to await more data from future studies. Taken together this manuscript adds significantly to our understanding of endocytosis in P. falciparum.

      This work is of interest for cell and molecular biologists working on Apicomplexa, but especially for the Plasmodium community.

      We thank the reviewer for this positive assessment.

      I am a cell and molecular biologist working on Toxoplasma gondii

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors characterized 4 proteins from P. falciparum via cellular (co-)localization, endocytosis, parasite growth, and artemisinin resistance assays. These proteins have been identified as candidates for Kelch13 compartment and a possible role in endocytosis in their previously work with quantitative BioID for potential proximity to K13 and Eps15 (Birnbaum et al. 2020). In the current work, additional 6 proteins were not confirmed as being associated to the K13 compartment. This experimental work was complemented by an in-silico analysis of protein domains based on AlphaFold algorithm. For this protein structure evaluation all proteins were chosen, which were experimentally confirmed to be linked to the K13 compartment in the current publication and previous work. With the work 3 novel proteins linked to artemisinin resistance or endocytosis could be functionally described (KIC12, MCA2, and MyoF) and a number of hypotheses were generated.

      We thank the reviewer for the assessment of the manuscript and the constructive comments.

      Major comments:

      The quality of the presented work is solid, the experimental design is adequate, and methods are presented clearly. The publication contains a lot of results both presented in text and in the figures and it is not always straight forward for the reader to follow the descriptions due to many details presented and a lack of context for some of these experiments.

      We thank the reviewer for this overall positive assessment.

      We now reordered the results section in an attempt to increase the flow of the manuscript. We also made changes to improve the context for the results. Given the further (very valid) requests for data on schizonts and invasion, there was an increased danger for a less linear manuscript that we hope to have acceptably managed with the re-arrange.

      Specific suggestions for consideration by the authors to improve the manuscript. Abstract: 1) R 31: Mention how the 4 proteins were identified as candidates, you need to refer to previous work to clarify this

      To clarify this the sentence was changed to (line 31): "Here we further defined the composition of the K13 compartment by analysing more hits from a previous BioID, showing that MyoF and MCA2 as well as Kelch13 interaction candidate (KIC) 11 and 12 are found at this site."

      2) R38: "Second group of proteins" is confusing - different from the 4 mentioned above? Significance to endocytosis unclear. Please unify terminology in the manuscript, see also comment below on proxiome.

      We changed the wording to clarify the group issue in the abstract as follows line 34: "Functional analyses, tests for ART susceptibility as well as comparisons of structural similarities using AlphaFold2 predictions of these and previously identified proteins showed that canonical vesicle trafficking and endocytosis domains were frequent in proteins involved in resistance or endocytosis (or both), comprising one group of K13 compartment proteins, While this strengthened the link of the K13 compartment to endocytosis, many proteins of this group showed unusual domain combinations and large parasite-specific regions, indicating a high level of taxon-specific adaptation of this process. Another group of K13 compartment proteins did not influence endocytosis or ART susceptibility and lacked detectable vesicle trafficking domains. We here identified the first protein of this group that is important for asexual blood stage development and showed that it likely is involved in invasion.”

      3) Abstract can only be understood after reading the full publication

      We attempted to amend this by expanding the abstract, particularly the changes highlighted in the previous two points.

      Results: 4) Table 1 is missing from the submitted materials

      We apologise for this mistake. Table 1 is now included.

      5) Consider to shorten and stratify the result section to focus on the significant data

      We rearranged the results in an attempt to streamline this section and are now starting with MyoF in the revised manuscript. However, as highlighted by the requests from reviewer 1, many details need to be available to support our conclusions. For instance the fact that GFP-tagging partially inactivated MyoF asked for further data to support our conclusion (HA-tagged version, showing that the location of the GFP-tagged version was consistent with the HA-tagged version, showing to what extent the different constructs affected growth and correlated with number of vesicles and bloating, see new figure 1M) or that KIC12 has two locations. Overall, we are therefore hesitant to remove data or description from the result part.

      6) Unclear how the localization and functionalization assays might be impaired by the fusion proteins Significance of ART resistance assay is not clear, in presence of strong growth effects due to inactivation or truncation of genes/proteins

      As indicated also in the example given in the previous point (this reviewer #5), the use of different cell lines (GFP-tagged live cells and small epitope tag in IFA) for targets with an indication for an effect of the tagging confirm that the location we assigned is reasonable. In the case of MyoF, the HA-tagged line, the partial inactivation due to GFP and the further inactivation in the GFP-tagged line by knock sideways show plausible increase of phenotypes (vesicle accumulation and bloated FV assays). Thereby the GFP-tagged line can be seen as a partial inactivation line that further supports our conclusions and overall this paints a consistent picture of the function of this protein in endocytosis (see new Figure 1M better illustrating this). Please note that the difference in location shown by this line compared to the HA-tagged proteins is only small (see also reviewer 1 major point 23ff). See also general discussion on tags at the end of this rebuttal.

      Significance of ART resistance assay: The ‘ART resistance assay’ is done comparing +/- ART (DHA) in identical parasites (originating from the same culture and the same condition). Hence, any growth effects are cancelled out and effects in reducing ART susceptibility would - if at all - be underestimated (see more detailed response to point 28, reviewer 1 and controls in Birnbaum et al., 2020 where we tested an unrelated essential protein, unrelated chemical insult and rapalog on 3D7 and did not detect any effect on RSA survival).

      MCA 7) Stratify results, order by significance of findings, it appears to be described in chronological order, improve readability/flow, eg ART resistance if mentioned in r138, but only reported in r183ff

      We attempted to stratify, but then the reason for generating the partial MCA2 disruption parasite line becomes very arbitrary and would leave the reader wondering why we at all truncated the protein at two thirds of the protein. Hence, we do not see a way around this chronological reporting. However, this part is now not at the start of the experimental results section anymore, possibly making it overall a bit more palatable.

      MyoF 8) R195 to 197 - consider moving to discussion as it is distracting here

      This was shortened and additional information (asked for by reviewer 1, major point 22) to clarify that MyoF was previously called MyoC, was added (line 147): “The presence of MyosinF (MyoF; PF3D7_1329100 previously also MyoC), in the K13 proxiome could indicate an involvement of actin/myosin in endocytosis in malaria parasites. "

      9) Term proxiome is introduced above, but not used in result section - suggest to unify language, eg r195 uses "K13 compartment DiQ-BioIDs" instead, which is not very convenient for the reader

      We carefully reviewed this and made this more consistent.

      10) What is the enrichment factor? Please provide for this and the following proteins, eg in Table 1

      The enrichment factor is log2 enrichment over control and this is now provided in table S1 (see also detailed answer for Reviewer 1 major point 12).

      11) R225 to 243 - overall significance of the growth experiments with mislocaliser is not clear, consider removing from manuscript or explain relevance more clearly

      See also point 28, reviewer 1: This experiment is actually quite important. It shows that if we conditionally inactivate the GFP-tagged MyoF, the growth is further reduced, as stated in line 208. It might have been confusing that the mislocalisation is only partial, but this is equivalent to a partial knock down and hence is useful. This becomes even more relevant with the specific assays following in the next paragraph: while the tagging of MyoF already resulted in vesicles, conditional inactivation with KS generated even more vesicles, showing that the same phenotype was rapidly increased when MyoF was further inactivated by a different means and this also correlated with growth. Hence, this is actually a very consistent phenotype that despite some shortcomings of the tools available to analyse this protein (due to the partial inactivation by the GFP tag) in our eyes looks very convincing. We now added a graph showing the correlation of growth and phenotypes to illustrate this (Figure 1L).

      We also tried to make this clearer by changing line 200 to: Hence, conditional inactivation of MyoF further reduced growth despite the fact that the tag on MyoF already led to a substantial growth defect, indicating an important role for MyoF during asexual blood stage development.” And line 208 to:“ This was even more pronounced upon conditional inactivation of MyoF by KS (Figure 1H), suggesting this is due to a reduced function of MyoF.”

      12) KIC11/KIC12 Enrichment factor?

      The enrichment (’average log2 Ratio normalized Kelch13 from Birnbaum et al. 2020’) is 1.65 for KIC11 and 1.32 for KIC12, which is now also explicitly shown in column D of Table S1.

      ** Referees cross-commenting**

      I would like to applaud reviewer #1 for a great, very thorough review and lots of detailed suggestions. I agree with the conclusions mentioned in the significance evaluation from reviewer #1 and #2: the work presented does not contain novel methods and the scope is rather narrow with the current results. (I am working on clinical studies with novel antimalarial agents)

      Reviewer #3 (Significance (Required)):

      On the one hand side, the authors have wrapped up some of the remaining protein candidates of the K13 compartment and could verify 4 of 10 proteins. The work is of interest for the scientific community working on endocytosis and malaria drug resistance mechanisms. Overall, the conclusions and findings from the previous work, Birnbaum et al. 2020, could be confirmed and extended mainly using the methods previously described. On the other hand, the authors made use of progress in protein structure predictions and identified domains linking the K13 compartment proteins to putative functions. The overlaid protein folds of the newly identified domains in figure 5 look convincing, but I can't comment on the technical details or cut-off used for this in-silico analysis.

      Extended general remarks on the systems used for this work:

      Mainly reviewer 1 suggest (in the general comments and the significance statement) that other systems would have been better suited to use for this work, namely glmS and diCre and also has concerns about the large tag which is seconded by a comment of reviewer 3. In light of this we here provide some extended considerations on the properties for conditional systems and tagging in regards to the goals of this work.

      We would like to point out that we do have experience with the systems considered better-suited by the reviewer (one of the first authors has extensively used glmS (Wichers et al., 2021c; Wichers et al., 2021a; Wichers et al., 2022; Wichers-Misterek et al., 2023) and our lab was one of the first to adopt the diCre system in P. falciparum parasites and we regularly us it (Birnbaum et al., 2017; Mesén-Ramírez et al., 2019; Kimmel et al., 2023)). Clearly, these methods have a lot of strengths but there are a number of issues to be considered for the assays we use in this work (see the next section on conditional inactivation systems). In a nutshell, we believe diCre would give a more reliable readout of the absolute level of "essentiality" (i.e. importance for growth) but is unsuitable or at least difficult to use for the assays that reveal the function of our interest in this work. GlmS basically combines the drawbacks of diCre and knock sideways and hence for most targets is not expected to give a better readout of level of "essentiality" but is similarly difficult to use for our specific assays. The fact that both of these systems are possible to use without adding a tag to the target may be an advantage but without tag one loses some very important features that can be critical to understand the outcome with a given system (see considerations on the tag further below).

      Conditional inactivation systems:

      1. __ speed of inactivation:__ glms acts on mRNA and diCre on the gene level, which makes them slower than techniques acting directly on the protein such as DD or KS. With diCre, mRNA and protein is still left, even if the gene is very rapidly excised. For instance for Kelch13 it takes 3-4 days after excising the gene until protein levels have waned enough that this manifests in a reduced growth (Birnbaum et al., 2017). While in some instances diCre permits same cycle analyses if the protein has a very rapid turn-over (e.g. Rab5a, (Birnbaum et al., 2017)), control in a few hours is still difficult. For vesicle accumulation and bloated food vacuole assays, which are done over comparably short time frames and with specific stages, it is rather challenging to hit the correct time of induction to have all the cells at the correct stage with suitably (and uniformly, ie all cells) sufficiently reduced target protein levels during the assay time. Slow acting systems are also more prone to secondary effects. The more immediate the inactivation, the closer it is to the core of the affected function. With vesicle trafficking processes this is particularly relevant as all vesicle trafficking in a cell is interconnected and there are always recycling pathways that maintain the membrane and protein homeostasis of individual compartments. Particularly for endocytosis there seem to be compensatory capacities at least in other organisms (see e.g. (Chen and Schmid, 2020)). One reason why knock sideways was developed is that it permitted to avoid compensatory changes when vesicle adaptors are inactivated (Robinson et al., 2010).

      The comparably short time frame for malaria parasites to go through different stages during blood stage development also is an issue relevant for inactivation speed. The advantage of speed and the danger of obscured phenotypes is highlighted by our work on VPS45 which showed that in trophozoites this protein is involved in the transport of hemoglobin to the FV whereas in late stages it also has a role in secretory processes. Both of these functions we were able to specifically assess in the same growth cycle using KS to rapidly inactivate the protein (Bisio et al., 2020) but with a slower system would have been more complicated to dissect.

      Speed of effect with glmS: unless the KS does not work well, glmS is slower acting than KS (it does not target the already synthesised protein which can remain in the cell) and also often suffers from only partial inactivation, hence the benefit of using it here is unclear. The option to have an untagged protein is a plus, however it also is a minus, as assessing efficiency (particularly in live cells e.g. for bloated assays etc a fluorescent tag is the only direct option to assess inactivation of target) is critical to ensure the phenotype manifests at the stage of interest.

      lethality/absolute phenotypic effects are detrimental to some assays to study the functions we are interested in for this work: no RSA can be conducted, if the gene is lost and the parasites die. Again, with diCre, one could attempt to hit the point when the parasites have lost sufficient amounts of the target protein when they are placed under ART but then the parasites need to continue growing for ~3 days, which is not possible if the cKO is lethal except for very slowly turning over proteins. However, in that latter case, the parasites likely still had full functionality of the target protein at the beginning of the RSA, when the drug pulse happens and there would be no effect. Knock sideways solves these problems by permitting knock sideways inactivation only under ART (or with a few hours pre-incubation depending on the inactivation speed) to not yet affect growth in a severe manner but inhibiting the process the protein is involved in. It may be possible to use glmS for RSAs, but the slow speed would complicate it (it would not permit control of target protein levels in a matter of a few hours to inactivate the target protein and then re-install it).

      None-absolute inactivation is also a strength for some functional assays. While we really like using diCre, in the case of EXP1 it made it necessary to complement the exp1 cKO parasites with low levels of EXP1 to be able to do functional assays without killing the parasites (Mesén-Ramírez et al., 2019; Mesén-Ramírez et al., 2021). While the lethality issue does not apply to glmS (like knock sideways, it also can be tuned), it is unclear what would be gained over knock sideways. Knockdown levels with glmS vary from gene to gene and cannot be predicted, it is in most cases considerably slower than KS, it requires glucosamine which becomes toxic at higher concentrations and might introduce off target effects and tracking protein levels during the assay would equally need GFP tagging.

      Integration of properties of conditional systems

      Given the above discussed properties, several factors have to be considered to be able to use a system for a given assay. Stage-specific transcription is one example. For diCre a protein not expressed in e.g. rings permits to remove the gene and the protein is never made in that parasite development cycle. We exploited this for instance for two proteins only expressed from the trophozoite stage onwards (Kimmel et al., 2023). However, if lethal (absolute effect problem), this also means one can also only see the phenotype on onset of expression of the target (e.g. if in mitosis, the first nuclear division in case the protein is absolutely essential for the process). This is just one example of such issues. Expression timing, turnover of the protein and homogeneity of stage-specific loss of protein will all influence how clearly the phenotype can be determined. All this will decide the exact time of loss/inactivation of the target protein to levels generating a phenotype and ideally therefore can be monitored during an assay (see considerations on tagging).

      For these reasons vesicle accumulation or bloated food vacuole assays are difficult with slow systems as ideally the target should rapidly be inactivated at the trophozoite stage and the result monitored before the cells have moved to the schizont stage. For this a well responding knock sideways is ideal as the protein can be rapidly taken away (sometimes within seconds) to visualise the immediate, direct effect in the cell.

      As shown for KIC11, there is also no disadvantage of using KS for proteins with other assays or proteins that result in different phenotypes. It permits stage-specific same cycle inactivation without having to worry about the turnover of mRNA and protein (Fig. 2F,G). Thus, besides the advantages of knock sideways for endocytosis related assays and RSAs, we also see no disadvantage of using knock sideways for the functional study of KIC11 which has a role other than endocytosis. KS also permits to specifically target the K13 pool of KIC12, something impossible or very difficult to do with other systems. Hence, we are of the opinion that the system for inactivation was adequate for most of the proteins analysed in this manuscript.

      Large tag: we agree that GFP-tagging can be a disadvantage but in our opinion its benefits often outweigh the drawbacks because it permits easy and immediate (on individual cell level, if need be) monitoring of the presence/location of the target protein (e.g. after KS, but given the discrepancy of the timing between gene excision and protein loss, it might be even more important for techniques such as diCre). No fixing/permeabilisation (prone to artifacts, prevents immediate view of cells) to detect a target with specific antibodies or via a small tag is needed with GFP. Similarly, the use of Western blots to do this is time consuming and impractical if monitoring of left-over protein in the course of an assay such as a bloated food vacuole assay is needed.

      In many cases, adding GFP has no negative effect. In addition, if the bulky folded structure of GFP is tolerated, it usually also tolerates the 2 to 4 12kDa FKBP domains in our standard tag. We also typically add a linker. This approach has worked for a large number of different proteins, including many essential ones for which we would not otherwise have obtained the integration cell lines (Birnbaum et al., 2017; Jonscher et al., 2019; Hoeijmakers et al., 2019; Birnbaum et al., 2020; Kimmel et al., 2023; Sabitzki et al., 2023). Hence, whenever a cell line is obtained with it, this tag in most cases is not a disadvantage. Admittedly an exception in this is MyoF and to some extent maybe MCA2 (we would like to stress that in the case of MCA2 the reason for not being able to obtain the full length tagged cell line is unclear: the protein can be severely truncated to less than 3% of its amino acid sequence and a GFP-tag is tolerated on the version with 2/3s of the protein left, which gives no good reason why the full length was not obtained; a potential reason could be a dominant negative effect). However, we obtained the full length with a small tag detected by IFA for both, MyoF and MCA2 and the location of these agreed well with the GFP tagged versions, indicating that the GFP-tagged versions are useful to show the location of these proteins in live cells.

      There are also tricks to attempt monitoring the effect of e.g. diCre without tagging the target. For instance, if a fluorescent protein is connected to excision without actually being fused to the target (ie excision of the gene leads to its expression of e.g. GFP), which would avoid adding a tag to the target itself. However, the problem with this is that expression of GFP does only show excision, but mRNA producing the target protein and left over target protein may still be there in the cell. All in all, the GFP-tag on the target, while with some drawbacks, is still our preferred method to control to monitor the target protein in the cell (in principle permitting quantification of ablation efficiency on the individual cell level).

      Conclusion on these considerations for this manuscript

      Based on these considerations we do not see the immediate benefit of changing the system for the conclusions drawn from this study and are unsure if they are indeed better suited for this work as suggested. While a more exact readout of "essentiality" might be possible with the diCre system we are of the opinion this is less important than learning the function of a protein which - as outlined above - we believe to be considerably more difficult with diCre and even more so with glmS considering our target functions. The same applies to target specific cellular pools of a protein as done here for KIC12. Clearly MyoF is one example where the employed systems shows limitations, but with the new Figure part showing consistency in phenotype with degree of inactivation (importantly with two different forms of inactivation) and the clarification that the location of the GFP-tagged and HA-tagged versions are actually quite similar in location, we do not think employing an extra system is warranted for the conclusions of this work. Admittedly, the apparent lack of need in ring stags might give an opening to attack MyoF using diCre (by excision before its major expression peak), but depending on lethality this might preclude extended analyses (possibly vesicle assays, for sure not RSAs).

      In the end the question is, if our approach provides the function of target analysed in this work and based on the data in our manuscript and the arguments in the rebuttal, we are reasonably confident that this is the case. It is not very likely the other mentioned techniques would result in a different conclusion on the function of the here studied proteins. In fact, we expect other commonly used techniques to be less suitable for the key assays in this work.

      References used in our responses to the reviewers’ comments:

      Behrens, H.M., Schmidt, S., Peigney, D., Sabitzki, R., Henshall, I., May, J., et al. (2023) Impact of different mutations on Kelch13 protein levels, ART resistance and fitness cost in Plasmodium falciparum parasites. bioRxiv 2022.05.13.491767.

      Behrens, H.M., Schmidt, S., and Spielmann, T. (2021) The newly discovered role of endocytosis in artemisinin resistance. Med Res Rev med.21848.

      Behrens, H.M., and Spielmann, T. (2023) Identification of domains in Plasmodium falciparum proteins of unknown function using DALI search on Alphafold predictions. bioRxiv 2023.06.05.543710.

      Birnbaum, J., Flemming, S., Reichard, N., Soares, A.B., Mesén-Ramírez, P., Jonscher, E., et al. (2017) A genetic system to study Plasmodium falciparum protein function. Nat Methods 14: 450–456.

      Birnbaum, J., Scharf, S., Schmidt, S., Jonscher, E., Hoeijmakers, W.A.M., Flemming, S., et al. (2020) A Kelch13-defined endocytosis pathway mediates artemisinin resistance in malaria parasites. Science (80- ) 367: 51–59.

      Bisio, H., Chaabene, R. Ben, Sabitzki, R., Maco, B., Baptiste Marq, J., Gilberger, T.W., et al. (2020) The zip code of vesicle trafficking in apicomplexa: Sec1/munc18 and snare proteins. MBio 11: 1–21.

      Blum, M., Chang, H.Y., Chuguransky, S., Grego, T., Kandasaamy, S., Mitchell, A., et al. (2021) The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49: D344–D354.

      Borrmann, S., Straimer, J., Mwai, L., Abdi, A., Rippert, A., Okombo, J., et al. (2013) Genome-wide screen identifies new candidate genes associated with artemisinin susceptibility in Plasmodium falciparum in Kenya. Sci Rep 3.

      Bozdech, Z., Llinás, M., Pulliam, B.L., Wong, E.D., Zhu, J., and DeRisi, J.L. (2003) The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol 1: e5.

      Burnette, W.N. (1981) “Western Blotting”: Electrophoretic transfer of proteins from sodium dodecyl sulfate-polyacrylamide gels to unmodified nitrocellulose and radiographic detection with antibody and radioiodinated protein A. Anal Biochem 112: 195–203.

      Casella, J.F., Flanagan, M.D., and Lin, S. (1981) Cytochalasin D inhibits actin polymerization and induces depolymerization of actin filaments formed during platelet shape change. Nature 293: 302–305.

      Cerqueira, G.C., Cheeseman, I.H., Schaffner, S.F., Nair, S., McDew-White, M., Phyo, A.P., et al. (2017) Longitudinal genomic surveillance of Plasmodium falciparum malaria parasites reveals complex genomic architecture of emerging artemisinin resistance. Genome Biol 18: 78.

      Chen, Z., and Schmid, S.L. (2020) Evolving models for assembling and shaping clathrin-coated pits. J Cell Biol 219.

      Dell’Angelica, E.C., Puertollano, R., Mullins, C., Aguilar, R.C., Vargas, J.D., Hartnell, L.M., and Bonifacino, J.S. (2000) GGAs: A family of ADP ribosylation factor-binding proteins related to adaptors and associated with the Golgi complex. J Cell Biol 149: 81–93.

      Demas, A.R., Sharma, A.I., Wong, W., Early, A.M., Redmond, S., Bopp, S., et al. (2018) Mutations in Plasmodium falciparum actin-binding protein coronin confer reduced artemisinin susceptibility. Proc Natl Acad Sci 201812317.

      Henrici, R.C., Edwards, R.L., Zoltner, M., Schalkwyk, D.A. van, Hart, M.N., Mohring, F., et al. (2020a) The plasmodium falciparum artemisinin susceptibility-associated ap-2 adaptin μ subunit is clathrin independent and essential for schizont maturation. MBio 11.

      Henrici, R.C., Schalkwyk, D.A. van, and Sutherland, C.J. (2020b) Modification of pfap2μ and pfubp1 Markedly Reduces Ring-Stage Susceptibility of Plasmodium falciparum to Artemisinin in Vitro. Antimicrob Agents Chemother 64.

      Henriques, G., Hallett, R.L., Beshir, K.B., Gadalla, N.B., Johnson, R.E., Burrow, R., et al. (2014) Directional selection at the pfmdr1, pfcrt, pfubp1, and pfap2mu loci of Plasmodium falciparum in Kenyan children treated with ACT. J Infect Dis 210: 2001–2008.

      Heredero-Bermejo, I., Varberg, J.M., Charvat, R., Jacobs, K., Garbuz, T., Sullivan, W.J., and Arrizabalaga, G. (2019) TgDrpC, an atypical dynamin-related protein in Toxoplasma gondii, is associated with vesicular transport factors and parasite division. Mol Microbiol 111: 46–64.

      Hirst, J., Lui, W.W.Y., Bright, N.A., Totty, N., Seaman, M.N.J., and Robinson, M.S. (2000) A family of proteins with γ-adaptin and VHS domains that facilitate trafficking between the trans-golgi network and the vacuole/lysosome. J Cell Biol 149: 67–79.

      Hirst, J., and Robinson, M.S. (1998) Clathrin and adaptors. Biochim Biophys Acta - Mol Cell Res 1404: 173–193.

      Hoeijmakers, W.A.M., Miao, J., Schmidt, S., Toenhake, C.G., Shrestha, S., Venhuizen, J., et al. (2019) Epigenetic reader complexes of the human malaria parasite, Plasmodium falciparum. Nucleic Acids Res 47: 11574–11588.

      Jonscher, E., Flemming, S., Schmitt, M., Sabitzki, R., Reichard, N., Birnbaum, J., et al. (2019) PfVPS45 Is Required for Host Cell Cytosol Uptake by Malaria Blood Stage Parasites. Cell Host Microbe 25: 166-173.e5.

      Kimmel, J., Schmitt, M., Sinner, A., Jansen, P.W.T.C., Mainye, S., Ramón-Zamorano, G., et al. (2023) Gene-by-gene screen of the unknown proteins encoded on Plasmodium falciparum chromosome 3. Cell Syst 14: 9-23.e7.

      Koreny, L., Mercado-Saavedra, B.N., Klinger, C.M., Barylyuk, K., Butterworth, S., Hirst, J., et al. (2023) Stable endocytic structures navigate the complex pellicle of apicomplexan parasites. Nat Commun 14: 2167.

      Kumari, V., Singh, A.P., Singh, J., Sharma, R., Akhter, M., Mishra, P.K., et al. (2018) Biochemical characterization of unusual cysteine protease of P. falciparum, metacaspase-2 (MCA-2). Mol Biochem Parasitol 220: 28–41.

      Lazarus, M.D., Schneider, T.G., and Taraschi, T.F. (2008) A new model for hemoglobin ingestion and transport by the human malaria parasite Plasmodium falciparum. J Cell Sci 121: 1937–1949.

      Lopez-Hernandez, F.J., Ortiz, M.A., Bayon, Y., and Piedrafita, F.J. (2003) Z-FA-fmk inhibits effector caspases but not initiator caspases 8 and 10, and demonstrates that novel anticancer retinoid-related molecules induce apoptosis via the intrinsic pathway. Mol Cancer Ther 2: 255–263.

      Lord, S.J., Velle, K.B., Mullins, R.D., and Fritz-Laylin, L.K. (2020) SuperPlots: Communicating reproducibility and variability in cell biology. J Cell Biol 219.

      MalariaGEN, Ahouidi, A., Ali, M., Almagro-Garcia, J., Amambua-Ngwa, A., Amaratunga, C., et al. (2021) An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. Wellcome open Res 6: 42.

      Marti, M., Good, R.T., Rug, M., Knuepfer, E., and Cowman, A.F. (2004) Targeting malaria virulence and remodeling proteins to the host erythrocyte. Science 306: 1930–3.

      Mesén-Ramírez, P., Bergmann, B., Elhabiri, M., Zhu, L., Thien, H. von, Castro-Peña, C., et al. (2021) The parasitophorous vacuole nutrient pore is critical for drug access in malaria parasites and modulates the fitness cost of artemisinin resistance. Cell Host Microbe 0: 283.

      Mesén-Ramírez, P., Bergmann, B., Tran, T.T., Garten, M., Stäcker, J., Naranjo-Prado, I., et al. (2019) EXP1 is critical for nutrient uptake across the parasitophorous vacuole membrane of malaria parasites. PLoS Biol 17: e3000473.

      Mukherjee, A., Crochetière, M.-È., Sergerie, A., Amiar, S., Thompson, L.A., Ebrahimzadeh, Z., et al. (2022) A Phosphoinositide-Binding Protein Acts in the Trafficking Pathway of Hemoglobin in the Malaria Parasite Plasmodium falciparum. MBio 13.

      Otto, T.D., Wilinski, D., Assefa, S., Keane, T.M., Sarry, L.R., Böhme, U., et al. (2010) New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq. Mol Microbiol 76: 12–24.

      Robinson, M.S., Sahlender, D.A., and Foster, S.D. (2010) Rapid Inactivation of Proteins by Rapamycin-Induced Rerouting to Mitochondria. Dev Cell 18: 324–331.

      Sabitzki, R., Schmitt, M., Flemming, S., Jonscher, E., Hoehn, K., Froehlke, U., and Spielmann, T. (2023) Identification of a Rabenosyn-5 like protein and Rab5b in host cell cytosol uptake reveals conservation of endosomal transport in malaria parasites. bioRxiv 2023.04.05.535711.

      Simwela, N. V., Hughes, K.R., Roberts, A.B., Rennie, M.T., Barrett, M.P., and Waters, A.P. (2020) Experimentally engineered mutations in a ubiquitin hydrolase, UBP-1, modulate in vivo susceptibility to artemisinin and chloroquine in plasmodium berghei. Antimicrob Agents Chemother 64.

      Spielmann, T., Gras, S., Sabitzki, R., and Meissner, M. (2020) Endocytosis in Plasmodium and Toxoplasma Parasites. Trends Parasitol 36: 520–532.

      Subudhi, A.K., O’Donnell, A.J., Ramaprasad, A., Abkallo, H.M., Kaushik, A., Ansari, H.R., et al. (2020) Malaria parasites regulate intra-erythrocytic development duration via serpentine receptor 10 to coordinate with host rhythms. Nat Commun 11.

      Traub, L.M., Downs, M.A., Westrich, J.L., and Fremont, D.H. (1999) Crystal structure of the α appendage of AP-2 reveals a recruitment platform for clathrin-coat assembly. Proc Natl Acad Sci U S A 96: 8907–8912.

      Wagner, M.P., Formaglio, P., Gorgette, O., Dziekan, J.M., Huon, C., Berneburg, I., et al. (2022) Human peroxiredoxin 6 is essential for malaria parasites and provides a host-based drug target. Cell Rep 39: 110923.

      Wall, R.J., Zeeshan, M., Katris, N.J., Limenitakis, R., Rea, E., Stock, J., et al. (2019) Systematic analysis of Plasmodium myosins reveals differential expression, localisation, and function in invasive and proliferative parasite stages. Cell Microbiol 21.

      Wan, W., Dong, H., Lai, D.-H., Yang, J., He, K., Tang, X., et al. (2023) The Toxoplasma micropore mediates endocytosis for selective nutrient salvage from host cell compartments. Nat Commun 14: 977.

      Wichers-Misterek, J.S., Binder, A.M., Mesén-Ramírez, P., Dorner, L.P., Safavi, S., Fuchs, G., et al. (2023) A Microtubule-Associated Protein Is Essential for Malaria Parasite Transmission. MBio .

      Wichers, J.S., Gelder, C. van, Fuchs, G., Ruge, J.M., Pietsch, E., Ferreira, J.L., et al. (2021a) Characterization of Apicomplexan Amino Acid Transporters (ApiATs) in the Malaria Parasite Plasmodium falciparum. mSphere 6.

      Wichers, J.S., Mesén-Ramírez, P., Fuchs, G., Yu-Strzelczyk, J., Stäcker, J., Thien, H. von, et al. (2022) PMRT1, a Plasmodium -Specific Parasite Plasma Membrane Transporter, Is Essential for Asexual and Sexual Blood Stage Development. MBio 13.

      Wichers, J.S., Scholz, J.A.M., Strauss, J., Witt, S., Lill, A., Ehnold, L.-I., et al. (2019) Dissecting the Gene Expression, Localization, Membrane Topology, and Function of the Plasmodium falciparum STEVOR Protein Family. MBio 10: e01500-19.

      Wichers, J.S., Tonkin-Hill, G., Thye, T., Krumkamp, R., Kreuels, B., Strauss, J., et al. (2021b) Common virulence gene expression in adult first-time infected malaria patients and severe cases. Elife 10.

      Wichers, J.S., Wunderlich, J., Heincke, D., Pazicky, S., Strauss, J., Schmitt, M., et al. (2021c) Identification of novel inner membrane complex and apical annuli proteins of the malaria parasite Plasmodium falciparum. Cell Microbiol 23: e13341.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      With the emergence and spread of resistance to Artemisinin (ART), a key component of current frontline malaria combination therapies, there is a growing effort to understand the mechanisms that lead to ART resistance. Previous work has shown that ART resistant parasites harbour mutations in the Kelch13 protein, which in turn leads to reduced endocytosis of host haemoglobin. The digestion of haemoglobin is thought to be critical for the activation of the artemisinin endoperoxide bridge, leading to the production of free radicals and parasite death. However, the mechanisms by which the parasites endocytose host cell haemoglobin remain poorly understood.

      Previous work by the authors identified several proteins in the proximity of K13 using proximity-based labelling (BioID) (Birnbaum et al. 2020). The authors then went on to characterise several of these proteins, showing that when proteins including EPS15, AP2mu, UBP1 and KIC7 are disrupted, this leads to ART resistance and defects in endocytosis leading to the hypothesis that these two processes are inextricably linked.

      In this manuscript, Schmidt et al. set themselves the task of characterising more K13 component candidates identified in their previous work (Birnbaum et al. 2020) that were not previously validated or characterised. They chose 10 candidates and investigated their localisations, and colocalisation with K13, and their involvement in endocytosis and in vitro ART resistance, 2 processes mediated by K13 and some members of the K13 compartments

      The authors show that of their 10 candidates, only 4 can be co-localised with K13. Then, using a combination of targeted gene disruption (TGD) as well as knock sideways (KS), they characterised these 4 proteins found in the K13 compartment. They show that MyoF and KIC12 are involved in endocytosis and are important for parasite growth, however their disruption does not lead to a change in ART sensitivity. The authors also confirm the findings of their previous publication (Birnbaum et al. 2020), using a slightly different TGD, that MCA2 is involved in ART resistance, however they did not check whether its disruption impacts haemoglobin uptake. They also show that KIC11 is not involved in mediating haemoglobin uptake or ART resistance. To finish, the authors used AlphaFold to identify new domains in the proteins of the K13 compartment. This led them to the conclusion that vesicle trafficking domains are enriched in proteins of the K13 compartment involved in endocytosis and in vitro ART resistance.

      The majority of the experiments conducted by the authors are performed to a good standard in biological and technical replicates, with the correct controls. Their findings provide confirmation that their 4 candidate genes seem to be important for parasite growth, and show that some of their candidates are involved in endocytosis. While the KD and KS approaches employed by the authors to study their candidate genes each have their own advantages and can be excellent tools for studying a large sets or genes, this manuscript highlights the many limitations of these approaches. For example, the large tag used for the KS approach can mislocalise proteins or disrupt their function (as is the case for MyoF), resulting in spurious results, or indeed the inability to generate the tagged line (as is the case for MCA2). The KS approach also makes the results of a protein with a dual localisation, like KIC12, extremely difficult to interpret.

      Moreover, the manuscript is disjointed at times, with the authors choosing to conduct certain experiments for only a subset of genes, but not for others. For example, considering that the aim of this paper was to identify more proteins involved in ART resistance and endocytosis, it is confusing why the authors do not perform the endocytosis assays for all their selected proteins, and why they do not do this for the proteins they identify in their domain search. There is significant room for improvement for this manuscript, and a generally interesting question. But in it's current format, other than confirming that MCA2 is involved in ART resistance (which was already known from the Birnbaum paper), the authors do not further expand our understanding of the link between ART resistance and endocytosis in this manuscript.

      Major Comments

      line 31: please change defined to characterised - defined suggests that novel proteins were identified in this study, which is not the case.

      line 37: please change 'second' to "another". As explained further below, the authors identified 3 classes of proteins (confer ART resistance + involved in HCCU, involved in HCCU only, or involved in neither).

      Line 40: You define KIC11 as essential but according to your data some parasites are still alive and replicating 2 cycles after induction of the knock sideways. Please consider changing "essential" to "important for asexual parasite growth"

      Line 40: please change 'second group' to 'this group'

      line 41: state here that despite it being essential, it is unknown what it is involved in.

      Line 50: the authors should state here that there is actually a reversal in this trend over the last few years.

      Line 54: please separate out the references for each of the two statements made in this line (a: that ART resistance is widespread in SEA, and b: that ART resistance is now in Africa) Reference 14 also seems to reference ART resistance in Amazonia - which is not covered by the statement made by the authors (in which case the authors should state ART is now present in Africa and South America). The authors should also reference PMID: 34279219 for their statement that ART resistance is now found in Africa (albeit a different mutation to the one found in SEA).

      Line 65: it is also worth mentioning here that there are other mutations in proteins other than K13, such as AP2mu and UBP1 (PMID: 24994911;24270944) that can lead to ART resistance.

      Line 80, 86: ref 43 is misused. Reference 43 refers to Maurer's clefts trafficking which takes place in the erythrocyte cytosol and is not involved in haemoglobin uptake as far as I know. Please replace ref 43 with one showing the role of actin in haemoglobin uptake.

      Line 98: the authors state here that they 'identified' further candidates from the K13 proxiome. This suggests that they identified new proteins in this paper, when in fact the list was already generated in ref 26. All they did was characterise proteins from that list that were not previously characterised. The authors should therefore remove identified from this statement.

      Line 107-108: it is not clear from this sentence why these proteins were left out of the initial analysis in Ref 26. A sentence here explaining this would be valuable for the reader.

      Line 117-123: The authors say that PF3D7_0204300, PF3D7_1117900 and PF3D7_1016200 were not studied because they were not in the top 10 hits. However, the current organisation of Supplementary Table 1 shows all 3 proteins among the top 10 hits (MyoF, KIC12, UIS14 and 0907200 being after them). I think the authors should reorganise their table. It is also unclear according to what the proteins in the table are ranked. Could the authors indicate the metric used for the ranking?

      Line 129-141: Can the authors be clearer with their explanations of the identification of mutation Y1344Stop? One dataset (ref 61) shows that 52% of African parasites have a mutation in MCA2 in position 1344 leading to a STOP codon. But another dataset (ref 62) shows that the next base is also mutated, reverting the stop codon. That should have been seen in the first dataset as well. Could the authors please clarify.

      Line 147: the authors say that MCA2 is expressed throughout the intraerythrocytic cycle as shown by live cell imaging. In Birnbaum et al 2020 fig 4I, the authors show that MCA2 is mainly expressed between 4 and 16hpi. But in Figure 1B of this manuscript there is a clear multiplication of MCA2 signal between trophozoite and schizont. How do the authors explain this discrepancy? Could expression of the truncated MCA2 be different than the full length? This cannot be assessed as expression and localisation of the full-length HA tag MCA2 is not shown in Schizonts. MCA2 expression seems also different for the MCA2TGD-GFP with no expression in rings.

      Line 158: would it not have been more useful for the authors to have episomally expressed MCA2-3xHA in their MCA2Y1344STOP-GFPENDO line to make sure that the truncated protein is indeed going to the correct compartment? The experiments done by the authors suggests that the MCA2Y1344STOP goes to the right location but does not really confirm it.

      Line 191: it is stated that MCA2 confers resistance independently of the MCA domain, however in both the MCA2-TGD and MCA2Y1344STOP-GFPENDO parasites, the MCA domain is deleted, and for both parasites, there is resistance (albeit to a lower level in the MCA2Y1344STOP-GFPENDO line). Therefore, how can the authors state that the ART resistance is independent of the MCA domain? This statement should be that resistance is dependent on the loss of the MCA domain.

      Line 192: Why did the authors not check if MCA2 is involved in endocytosis? They state later on in the manuscript that they did not do endocytosis assays with TGD lines, however if the authors include the correct controls, this could be easily done. It would also be really interesting to see whether endocytosis gets progressively worse going from WT to MCA2Y1344STOP to MAC2TGD. This experiment (as well as doing endocytosis assays for KIC4 and KIC5 TGD lines) would drastically increase the impact of this study. These experiments would not take more than 3 weeks to perform, and would not require the generation of new lines.

      The authors should consider re-organising the MCA2 section, first showing that the 3xHA tagged line colocalises with K13, then performing the new truncation.

      Line 197: Once again ref 43 is not correct to illustrate that actin/myosin is involved in endocytosis

      Line 202: the authors state that MyoF localises near the food vacuole from ring stage/trophs onwards. However, how can this statement be made in schizonts based on these images (Fig. 2A), where it doesn't look like MyoF is anywhere near the FV? This statement can only be made for schizonts if co-localised with a FV marker (which is done in Fig. 2B), however, based on the number of MyoF foci, it appears that this was not done for schizonts. Please either remove the statement that MyoF is near the food vacuole from trophs onwards (because it is only seen near the FV up until trophs) or show the data in Fig. 2B of schizonts to substantiate these claims.

      Line 204-206: what does this statement bring to the paper? Is it to show that it is the real localisation of MyoF because 2 tag cell line show the same localisation? I don't think this is needed, especially as later in the manuscript an HA-tag MyoF line is used and show similar localisation.

      Line 212: The overlap of K13 with MyoF in Fig 2C 3rd panel (1st trophozoite panel) is not obvious, especially as the MyoF signal seems inexistant. I would advise the authors to replace with a better image. Also, why are there no images of schizonts shown in Figure 2C?

      Line 217: the spatial association of MyoF with K13 is very different when it is tagged with GFP and when it is tagged with 3xHA. The way the authors word it here, it seems that there is agreement with the two datasets, when this is not in fact the case (59% overlap for MyoF-GFP and only 16% overlap with MyoF-3xHA). These data suggest that the GFP and the multiple FKBP tags are doing something to the protein and therefore maybe the ensuing results using this line should not be trusted or be taken with a pinch of salt.

      Line 219: the authors state here that they could not detect MyoF-GFP in rings, when in Figure 2C they show MyoF-GFP in rings, and also show that they could detect MyoF in Sup Fig. 3B with the 3xHA tagged line. Is this a labelling mistake in Figure 2C? If the authors could indeed not see MoyF-GFP in rings, this statement should have been made when Figure 2A was presented, and not so late in the manuscript, which causes confusion. Line 237: Showing a DNA marker (DAPI, Hoescht) for Figure 2E, and subsequent figures using mislocalisation to the nucleus, would help the reader assess efficiency of the mislocalisation.

      Line 254-256: authors should show the results of the bloating assay for parental 3D7 parasites (+ and - rapalog) to see whether the MyoF line - rapalog has increased baseline bloating. This applies to all subsequent FV bloating assays.

      Line 254-257: The authors say that because fewer parasites show a bloated food vacuole upon inactivation of MyoF it means that less hemoglobin reached the food vacuole. I understand the authors statement, however, shouldn't they look at the size of the food vacuole, instead of the number of parasites with bloated FV, to make such a statement? This has been done for KIC12 so why not doing it for MyoF?

      Line 259-261: these results would be difficult to interpret namely because the authors have dying parasites, which is exacerbated with the protein being knocked sideways. The authors should mention the pitfalls their knock sideways and tagging design here.<br /> Line 260-261: RSA is an assay relying on measuring parasite growth 1 cycle after a challenge with ART for 6 hours.

      Line 261-263: the authors sate that MyoF has a function in endocytosis but at a different step compared to K13 compartment proteins. I am not sure what they mean here. Can this be clarified? Do the authors mean that it is involved in endocytosis but not in ART resistance? If so, this is a very difficult statement to make since the parasites are dying. Is there any evidence of point mutations in MyoF in the field?

      Line 298: the authors state that there is no growth defect in the first cycle when rapalog is added to the KIC11 line, however based on Figure 3D, there is evidently a 25% reduction in growth compared to - rapalog at day 1 post treatment, and a 60% reduction by day 2, which is still within the 1st growth cycle. The authors should either revise their statement or provide an explanation for these findings. The authors should also explain why their Giemsa data in Fig. 3E is not in accordance with their FACS data.

      Line 301: KIC11 could also be important very early for establishment of the ring stage for example for establishment of the PV. Also, was mislocalisation assessed in rapalog-treated parasites at 72 hours or in cycle 3?

      Line 311: the authors should change the sentence from 'not related to endocytosis' to 'not related to endocytosis or ART resistance'.

      Line 323-325: Authors say that a nuclear GFP signal can be observed in early schizonts for KIC12. According to the pictures provided in Figure 4A and Figure S5A it is not very obvious. Also faint cytoplasmic GFP signal could only be background as we can see that exposure is higher for schizont pictures

      Line 326-328: The authors say that kic12 transcriptional profile indicate mRNA levels peak (no s at peak) in merozoites. Should they show live cell imaging of merozoites then? Because from the Figure 4A schizont pictures where schizonts are almost fully segmented no signal can be observed. Line 347: The authors state that using the Lyn mislocaliser the nuclear pool of KIC12 is inactivated by mislocalisation to the PPM. This tends to suggest that only the nuclear pool of KIC12 is mislocalised. How is it possible that only the nuclear pool is mislocalised? Line 368-369: Effect was also only partial for MyoF. Why didn't you measure the same metrics for MyoF? Line 379: you don't know if all proteins acting later in endocytosis will have an increased number of vesicles as a phenotype

      Line 413-414: The authors state that no growth defect was observed upon KS of 1365800. Is growth alone enough to say that there is no impact on endocytosis?

      Line 432: in this section, the authors state that KIC4 and KIC5 seem to have domains that may suggest these proteins are involved in endocytosis, based on the alpha fold data that is publicly available. Considering the authors have TGD-SLI versions of these lines (Birnbaum et al. 2020) and have already confirmed in this previous publication that they confer resistance to ART; it would make sense to look at endocytosis for these genes. This would be a relatively simple and straightforward experiment, taking no longer than two to three weeks, and would require no additional reagents or line generation. Doing these experiments would add a lot more weight to this final section. The authors later state that KIC4 and 5 are TGD lines, so not the best for endocytosis assays. It is unclear why this would be difficult to do if an adequate control is contained in the experiment (such as parental 3D7). It explains why they did not perform the MCA2 endocytosis assays further up, but in my opinion, an attempt at doing these assays is important and would significantly increase the impact of this paper.

      Line 490-493: the authors state that the K13 compartment proteins fall in two groups, some that are involved in ART resistance AND endocytosis, and some that have different functions. However, in this manuscript the authors have demonstrated 3 flavours that K13 compartment proteins can come in: • Some that confer ART resistance and are involved in HCCU (MCA2) • Some that are involved in HCCU but not ART resistance (MyoF & KIC12) • Some that are involved in neither (KIC11) The authors should therefore revise this statement.

      Line 508: the authors state that they expanded the repertoire of K13 compartments, when in fact they functionally analysed them - they did not do another BioID to identify more candidates.

      Line 570-572: has anyone ever tested whether CytoD or JAS treatment in rings, is sufficient to mediate ART resistance? Something similar to what was done in PMID 21709259 with protease inhibitors. If not this would be a pretty interesting experiment for the authors to do that could shed more light on the MyoF data. It would take maybe 2 weeks to do and not require the generation of any new lines. This would clarify whether other Myosins other than MyoF are involved in endocytosis, as is suggested by previous publications (PMID: 17944961).

      Line 608: inhibitors targeting the metacaspase domain of MCA2 may inadvertently inactivate other essential parts of the protein. They authors should acknowledge this possibility in the text.

      Line 624-625: the authors state that MyoF is 'lowly expressed in rings' - indeed this is the case in their MyoF-2xFKBP-GFP-2xFKBP line which the authors established has defects due to the tag, but it appears from their MyoF-3xHA tagged line that it is expressed in rings. The authors should therefore revise their statement, and be careful of making claims based on their defective line and using fluorescence imaging as their only metric. If they do want to make the statement that it is not there in rings, they should also do a western blot, which is much more sensitive since it amplifies the signal compared to an image of one parasite.

      Line 635: arguably this is the 3rd variety and not the 2nd (the authors already mentioned 2 types - ones that are involved in HCCU AND ART and those involved in HCCU only). See comment for line 490-493 above.

      Line 785: Bloated food vacuole assay/E64 hemoglobin uptake assay method specify that a concentration of 33mM E64protease inhibitor was used. However, in reference 44, cited in the manuscript, a concentration of 33µM E64 was used. Please confirmed if this is just a typo or if 1000x E64 concentration was used which renders the experiment invalid.

      Line 788: it is unclear from this section what is considered a bloated food vacuole - is there an area above which the FV is considered bloated? Do the authors do these measurements manually or use an addon in FIJI/ImageJ? What is the cutoff for if a FV is bloated? Please clarify. Additionally, for the representative images + rapalog for Figures 2H and 4H, it would be useful to see where the authors delineate the FV (add a white circle showing what is actually measured).

      Line 863-864: this sentence seems to be out of place.

      Line 875: the authors state that there is a light blue wedge, when the circle consists of grey and black wedges. Please revise this.

      Line 1059-1061: it is unclear whether the individual growth curves are different clones or whether they are just the same experiment repeated? If it is the latter, then why are they not combined, as is traditionally done?

      Line 919-924: the authors mention a blue and red line, but there is only a black line in figure 3D. Moreover, the experiment of using the LYN mislocaliser was only done for KIC12 according to the manuscript. Additionally, the y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis there are several days. In the text it says there is no growth defect until the second cycle, but from this graph it appears the growth defect is evident as early as 1 day post rapalog treatment. Can the authors please clarify and correct the issues pointed out.

      Figure 1 panel B & C: the label of the figure where the signal from MCA2Y1344STOP-GFP is shown with the DAPI signal overlayed is deceptive since it suggests that this is the signal of full length MCA2. Please change the label of this panel from MAC2/DAPI to MCA2Y1344STOP/DAPI. The same is true for Panel C for the image labeled MCA2/K13 - please change this to MCA2Y1344STOP/K13.

      Figure 2B: what stages are these parasites? Please state this in the figure. Based on the MyoF pattern, it looks like rings in the upper panel and trophs in the bottom pannel. Why were schizonts not shown?

      Figure 2D&F: it is not very meaningful when growth assays are shown as a final bar after 4 days of growth. It is much more useful and informative to see a growth curve instead (as is shown in the supplementary), since it shows if the defect is apparent in the first growth cycle or later. With the way the data is currently shown, this is not apparent. I would advise the authors to switch the graph in 2F out of a combined graph of all the biological replicates growth curves for S3D - showing error bars.

      Figure 3: why were the calculation of FV area, parasite area and FV/parasite area only done for KIC12 and not done for MyoF? It would be interesting to see if any of these values are different for MyoF - whether the parasites are smaller in area and therefore FV smaller. Please present them Figure 2. Images should be already available and would not require further experiments to be done, only the analysis.

      Figure 3B: why is there no spatial association assessment for KIC11 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins.

      Figure 3D: The y axis of the figure states relative growth day 4[%] compared to rapalog, but then on the x axis the experiment takes place over several days. Is this a typo in the y axis? Additionally, the authors state in line 287-290 that the growth defect upon addition of rapalog is only seen in the second cycle, but from this graph it appears the growth defect is already evident 1 day post rapalog addition. The figure legend also does not make sense for this figure since it mentions a blue and a red line, when there is only a black line present. The legend also mentions the LYN mislocaliser which was used for KIC12 not KIC 11 (see above).

      Figure 3E: the colour for Control and Rapalog 4 hpi are very similar and very hard to discern. Please choose an alternative colour or add a pattern to one of the samples. The y axis is also missing a label. Is this supposed to be parasitemia (%)?

      Figure 4A: the ring shown in this figure does not appear to be a ring (it is far too large and appears to have multiple nuclei?). Do the authors have any other representative images to show instead?

      Figure 4B: why is there no spatial association assessment for KIC12 and K13 as was done for the MCA2 and MyoF? The authors should show a pie chart showing the degree of association here as was done for the other proteins. This should be done for the different life cycle stages considering the changing localisation of KIC12.

      Figures 4C&E: it is extremely important to show the DNA stain in both these samples considering that a portion of KIC12 is in the nucleus! Please add the DAPI signal for these figures (as for all other figures!).

      Figure 4E: this figure should be presented before 4D (considering the line being presented in 4E is used in an experiment in 4D). The authors should switch the order of these two.

      It is unclear why in many of the fluorescence images the authors do not show the DAPI signal - particularly when colocalising with K13 and when doing the knock sideways experiments. Please add these images to the figures - I would assume they have already been taken, so would simply involved adding the images to the panel.

      Throughout the manuscript, there is no western blot confirming the correct size of their modified proteins. This should be provided.

      None of the figures are appropriate for individuals with colour blindness, limiting their accessibility to the paper. Please change the colour schemes for all fluorescent images using magenta/green or an alternative colour combination appropriate for colourblind individuals.

      Minor Comments

      line 29: remove 'are'.

      Line 29: the text says "HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins are among the few proteins so far functionally linked to this process." The sentence should be: 'HCCU is critical for parasite survival but is poorly understood, with the K13 compartment proteins among the few proteins so far functionally linked to this process."

      line 44: remove 'the'

      Line 48: consider mentioning here that malaria is caused by the parasite Plasmodium - otherwise the first mention of parasite in line 52 is confusing for the non-specialist reader.

      Line 49: estimated malaria-related death and case numbers are from the 2021 WHO World malaria report. You cite the 2020 WHO World malaria report.

      Line 53: please insert the word 'have' between now and also.

      Line 54: please change 'was linked' to is linked

      Line 72: I would specify that free heme is toxic to the parasite. Especially as you mention that hemozoin is nontoxic. Sentence would be "where digestion results in the generation of free heme, toxic to the parasite, which is further converted into nontoxic hemozoin"

      Line 90: authors should either say "in previous works" or "in a previous work"

      Line 91: "We designated these proteins as K13 interaction candidates (KICs)"

      Line 95: please change 'rate' to number

      Line 109: Please include a coma before (ii).

      Line 112: as shown by Rudlaff et al in the paper you are citing, PPP8 is actually associated with the basal complex. You can say that "(ii) were either linked or had been shown to localise to the inner membrane complex (IMC) or the basal complex (PF3D7...).

      Line 114: Protein PF3D7_1141300 is called APR1 in the manuscript but ARP1 in Supplementary Table 1. Please correct.

      Line 131: please define SNP - this is the first use of the acronym.

      Line 133-134: South-East Asia instead of "South Asia"

      Line 135: please explain what TGD is - it is referred to over and over again in the manuscript without ever being explained.

      Line 145: change 'Western blot' to western blot - only Southern blot is capitalised since it is named after an individual, while the other techniques are not.

      Line 152: add "the" between 'and spatial'

      Line 158: please define SLI as selected linked integration, since it is the first use of the acronym.

      Line 178: introduce a coma after protein. Sentence should be "Proliferation assays with the MCAY1344STOP-GFPendo parasites which express a larger portion of this protein, yet still lacking the MCA domain (Figure 1), indicated no growth ...

      Line 195: the authors could mention that MyoF was previously called MyoC in the Birnbaum 2020 paper. I wanted to check back in the Birnbaum 2020 paper and could not find MyoF

      Line 200: "Expression and localisation of the fusion protein was analysed by fluorescent microscopy". Why expression was not analysed also by western Blot same as for MCA2?

      Line 204: I could not find any mention of MyoF (Pf3D7_1329100) in reference 65. Please remove reference 65 if not correct. Also reference 66 looks at Plasmodium chabaudii transcriptomes so I would specify that "This expression pattern is in agreement with the transcriptional profile of its Plasmodium chabaudii orthologue"

      Line 208: Please indicate a reference for P40 being a marker of the food vacuole

      Line 220-224: The authors should consider changing to " Taken together these results show that MyoF is in foci that are mainly close to K13 and, at times, overlapping, indicating that MyoF is found in a regular close spatial association with the K13 compartment."

      Line 255: In Figure 2H, and subsequent figures showing bloated FV assay, I would delineate the food vacuole with dashed line as in Birnbaum et al. 2020 to help the reader understanding where the food vacuole is.

      Line 265-266: Here the title says that KIC11 is a K13 compartment associated protein, but the title of Figure 3 says KIC11 is a K13 compartment protein. I noticed that you make the difference between K13 compartment protein et K13 compartment associated protein for MyoF for example which is not clearly associated with the K13 compartment. Which one is it for KIC11?

      Line 309-310: indicate a reference for your statement "which is in contrast to previously characterised essential K13 compartment proteins".

      Line 377: Figure 4I, please correct 1st panel Y axis legend

      Line 404: replace "dispensability" with dispensable

      Line 416: can the authors provide any speculation as to why they observed these proteins as hits in the BioID experiments?

      Line 451: Where the "97% of proteins containing these domains also contain an Adaptin_N domain and function in vesicle adaptor complexes as subunit " come from. Do you have a reference?

      Line 465-467: the same could be said for KIC4 as it also has a VHS domain.

      Line 477-479: Can be rephrased to "However, we found this protein as being likely dispensable for intra-erythrocytic parasite development and no colocalisation with K13 could be demonstrated, suggesting a limited role for PF3D7_1365800 in endocytosis. Or something like that. Makes it clearer.

      Line 535: Have AP-2 or AP-2 been shown to be at the K13 compartment?

      Line 569: reference 43 is wrong

      Line 746: typo "ot" instead of or.

      Line 801: method for Domain Identification using AlphaFold specify that RMSDs of under 5Å over more than 60 amino acids are listed in the results. However, there is a typo in Figure 5B for KIC5 where it says "RMSD 4.0 Å over 8 aa". Please correct.

      Line 856: In Figure 1E, please use the same Y axis legend as in Figure 2D "relative growth at day 4 [%] compared with 3D7"

      Figure S1: Some PCR gels check for integration are presented as 5', 3' and ori whereas other gels are presented as ori, 5' and 3'. This is confusing. Figure S1: Why was the expression of only MCA2 was verified by Western blot? What about the other proteins?

      Line 493: Considering KIC11 was not involved in HCCU or ART resistance it might be worth mentioning in this section that it is of note that there are no domains detected that would be involved in endocytosis.

      Line 503-506: is it wise to generate more drugs that target a pathway that is already highly susceptible to mutations? The authors should add a statement explaining how this might be avoided.

      Throughout, scale bars are stated in the figure legends at the end of the legend. This is a slightly confusing format. The authors should consider stating the scale bar for each sub-legend where a fluorescence image is taken.

      Referees cross-commenting

      After reading reviewer 2 and 3's comments, I think there are significant overlaps in the key points raised in terms of questions about fusion proteins and their potential partial mis-localisation, better descripton of results and target selection. Overall I think we agree that the work has potential, but in its current form does not represent a major advance. It would be immensely helpful if the manuscript would be carefully edited for a better flow and linear description of results.

      Significance

      The authors set out to test whether other proteins that are in the vicinity of K13 are involved in mediating ART resistance and endocytosis. This is an interesting question. However, other than MCA2 which was already known to be involved in mediating ART resistance (and was not tested for its involvement in endocytosis), none of their candidate proteins seem to be involved in mediating both these functions. The authors show that the other proteins tested appear important for parasite growth, with KIC12 and MyoF involved in mediating endocytosis. While these findings are novel, the KS approach used by the authors casts some doubt over the findings, and would mean that these findings would have to be re-tested with a more reliable approach, such as the GlmS system or generating a conditional knockout using the DiCre system. Despite not advancing our understanding of ART resistance, or identifying further players involved in this process, this manuscripts provides two candidates that are involved in mediating endocytosis and a further candidate that appears to be important for parasite growth. Further work on these proteins will be required to understand their exact roles. As stated above, there is currently limited interest for these results (limited to researchers working on endocytosis in apicomplexan parasites and possibly the wider endocytosis field from an evolutionary perspective), however with further work, this could increase the impact and interest of this work substantially.

      The authors do not describe any novel methods/approaches within this work.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank all four reviewers for their helpful and constructive comments. We have gone through each and every comment and proposed how we would address each point raised by the reviewers. We are confident our proposed revisions are feasible within a reasonable and expected time frame. Some of the comments regarding minor typo/aesthetics and extra references have already been addressed in the transferred manuscript. The changes are highlighted in yellow in the transferred manuscript.

      2. Description of the planned revisions

      Reviewer #1

      Major points:

      1. The presented work itself (Figures 1-4) does not need significant adjustments prior to publication, in my view, with only a few points to address. However, the work in Figure 5- doesn't really support the claims the authors make on its own, and would require some additional experiments or at the very least discussion of the caveats to its current form.

      We thank the reviewer for these comments and will follow the reviewer’s suggestion by discussing the caveats regarding the interpretation of Figure 5. We will also add to the discussion to suggest future research approaches beyond the scope of this manuscript that would address the functional importance of localised mRNA translation. We will briefly mention in the discussion methods such as the quantification of the mRNA foci and the disruption of the mRNA localisation signals to disrupt localised translation and the use of techniques such as Sun-Tag (Tanenbaum et al, 2014) and FLARIM (Richer et al, 2021) to visualise local translation directly.

      Tanenbaum et al, 2014 DOI: 10.1016/j.cell.2014.09.039

      Richer et al, 2021 DOI: 10.1101/2021.08.13.456301

      1. Localized glia transcripts, are they "glial/CNS/PNS" significant or are they similar to other known datasets of protrusion transcriptomes? The authors compared their 4801 "total" localized to a local transcriptome dataset from the Chekulaeva lab finding that a significant fraction are localized in both. As the authors note, this is in good agreement with a recent paper from the Talifarro lab showing conservation of localization of mRNAs across different cell types. What the authors haven't done here, is further test this by looking at other non-neuronal projection transcriptomic datasets (for example Mardakheh Developmental Cell 2015, among others). If the predicted glia-localized processes are similar to non-neuronal processes transcriptomes, this would further strengthen this claim and rule out some level of CNS/PNS derived linage driving the similarities between glia and neuronal localized transcripts.

      This is a good point and we thank the review for pointing out this interesting cancer data set. We will do as the reviewer suggests and intersect our data with Mardakheh Dev Cell 2015 to test the further generality of localisation in neurons and glia, in other cell types. Specifically, we plan to intersect both glial (this study) and neuronal (von Kuegelgen & Chekulaeva, 2020) dataset with protrusive breast cancer cells (Mardakeh et al, 2015).

      von Kuegelgen & Chekulaeva, 2020 DOI: 10.1002/wrna.1590

      Mardakeh et al, 2015 DOI: 10.1016/j.devcel.2015.10.005

      1. The presentation/discussion around Figure 3 is a bit weaker than other parts of the manuscript, and it doesn't really contribute to the story in its current form. Notably there is no discussion about the significance of glia in neurological disorders until the very end of the manuscript (page 21), meaning when its first brought up.. it just sits there as a one off side point. The authors might consider strengthening/tightening up the discussion here, if they really want to keep it as a solo main figure rather than integrating it somewhere else/putting it into supplemental. In my view, Figures 2 & 3 should be merged into something a bit more streamlined.

      This is a good point. We plan to strengthen the presentation of Figure 3 and discussion of the significance of glia in neurological disorders by adding a description of the Figure in the Results section and highlighting the significance of glia in nervous system disorders in the Discussion section.

      1. Why aren't there more examples of different mRNAs in Figure 4? Seems a waste to kick them all to supplemental.

      We agree that it could be helpful to show different expression patterns in the main figure. To address this point we will add Pdi (Fig. S4D), which shows mRNA expression in both the glia and the surrounding muscle cell. This pattern is in contrast to Gs2, which is highly specific to glial cells. We will also note that although pdi mRNA is present in both the glia and muscle, Pdi protein is only abundant in the glia, suggesting that translation of pdi mRNA to protein is regulated in a cell-specific manner.

      1. The plasticity experiments, while creative, I think need to be approached far more cautiously in their interpretation. Given that the siRNAs will completely deplete these mRNAs- it really needs to be stressed any/all of the effects seen could just be the result of "defective" or "altered" states in this glial population- which has spill over effects on plasticity in at the NMJ. Without directly visualizing if these mRNAs are locally translated in these processes and assessing if their translation is modulated by their plasticity paradigm, all these experiments can say is that these RNAs are needed in glia to modulate ghost bouton formation in axons. This represents the weakest part of this manuscript, and the part that I feel does not actually backup the claims currently being made. Without any experiments to A. quantify how much of these transcripts are localized vs in the cell body of these glia, B. visualize/quantify the translation of these mRNAs during baseline and during plasticity; the authors cannot use these data to claim that localized mRNAs are required for synaptic plasticity.

      We are grateful to the reviewer for pointing out that we were not precise enough in defining our interpretation of the structural plasticity assay. We did not intend to claim that our results show that local translation of these transcripts is necessary for plasticity, only that these transcripts are localized and are required in the glia for plasticity in the adjacent neuron (in which the transcript levels are not disrupted in the experiment). Definitively proving that these transcripts are required locally and translated in response to synaptic activity would require genetic/chemical perturbations and imaging assays that would require a year or more to complete, so are beyond the scope of this manuscript. To address this point, we will clarify that the results do not show that localized transcripts are required, only that the transcripts are required somewhere specifically in the glial cell (without affecting the neuron level), and we can indeed show in an independent experiment that there are localized transcripts.

      Reviewer #2

      Major points:

      1. The authors analyse the 1700 shortlisted genes for Gene Ontology and associations with austism spectrum disorder, leading to interesting results. However, it is not clear to what extent the enrichments they observe are driven by their presumptive localization or if the associations are driven to a significant extent by the presence of these genes in the selected cell types in the Fly Cell Atlas. One way to address this would be to perform the GO and SFARI analysis on genes that are expressed in the same cells in the Fly Cell Atlas but were not shortlisted from the mammalian cell datasets - the results could then be compared to those obtained with the 1700 localized transcripts.

      This is a fair point raised by the reviewer as genes involved in neurological disease such as Autism Spectrum Disorder may be enriched in CNS/PNS cell types. We will follow the reviewer’s suggestion to perform GO and SFARI gene enrichment analysis in genes that were not shortlisted for presumptive glial localisation.

      1. Although the authors attempt to justify its inclusion, I'm not convinced why it was important to use the whole cell transcriptome of perisynaptic Schwann cells as part of the selection process for localizing transcripts. Including this dataset may reduce the power of the pipeline by including mRNAs that are not localized to protrusions. How many of the shortlisted 1700 genes, and how many of the 11 glial localized mRNAs in Table 5, would be lost if the whole cell transcriptome were excluded. More generally, what is the distribution of the 11 validated localizing transcripts in each dataset in Table 4? This information might be valuable for determining which dataset(s), if any, has the best predictive power in this context.

      We thank the reviewer for raising this point, which we will address with further analysis and adding to the discussion. We propose to address the criticism by running our analysis pipeline without the inclusion of the dataset using Perisynaptic Schwann Cells (PSCs) and then intersect with the PSCs-expressed genes, since their functional similarity with polarised Drosophila glial cells is highly relevant. We also agree with the reviewer that it would be a useful control for us to assess the ‘predictive power’ of each glial dataset by calculating their contribution to the shortlisted 1,700 glial localised transcripts and to the 11 experimentally validated transcripts via in situ hybridisation. To address this point, we plan to add this information in the revised manuscript.

      1. Did the authors check if any of the RNAi constructs are reducing levels of the target mRNA or protein? Doing so would strengthen the confidence in these important results significantly. In any case, the authors should also mention the caveat of potential off-target effects of RNAi.

      We thank the reviewer for their useful comment and agree that the extent to which the RNAi expression reduces the levels of mRNA is not specifically known. We will add a FISH experiment on lac, pdi and gs2 RNAi showing very strong reduction in mRNA levels. We will also add an explanation of the caveats of the use of the RNAi system to the discussion.

      1. Methods: what is the justification for assuming that if the RNAi cross caused embryonic or larval lethality then the 'next most suitable' RNAi line is reporting on a phenotype specific to the gene. If the authors want to claim the effect is associated with different degrees of knockdown they should show this experimentally. An alternative explanation is that the line used for phenotypic analysis in glia is associated with an off-target effect.

      We thank the reviewer for this comment. We agree that off target effects cannot in principle be completely ruled out without considerable additional experimental analysis beyond the scope of this manuscript. To address the criticism we will remove the expression data of the lines that cause lethality and revise the discussion to explain that the level of knockdown in each line is unknown, and would require further experimental exploration.

      Minor points:

      1. It would be helpful to have in the Introduction (rather than the Results, as is currently the case) an operational definition of mRNA localization in the context of the study. And is it known whether or not localization in protrusions is the norm in mammalian glia or the Drosophila larval glia? I ask because it may be that almost all mRNAs diffuse into the protrusion, so this is not a selective process. One interesting approach to test this idea might be to test if the 1700 shortlisted transcripts have a significant underrepresentation of 'housekeeping' functions.

      We thank the reviewer for this excellent suggestion. To address the comment, we will move our explanation of the operational definition of mRNA localization to the Introduction. We will also perform enrichment analysis of housekeeping genes within 1,700 shortlisted transcripts compared to the transcriptome background, as the reviewer suggested.

      Reviewer #3

      Major points:

      1. The authors have pooled data from different studies across different type of glial cells performed from in vitro to in vivo. While pooling datasets may reveal common transcripts enriched in processes, this may not be the best approach considering these are completely different types of glial cells with distinct function in neuronal physiology.

      We thank the reviewer for highlighting the need for us to further justify why we pooled datasets. We will revise the manuscript to better emphasise that the overarching goal of our study was to try to discern a common set of localised transcripts shared between the cells. The problem with analysing and comparing individual data sets is that much of the variation may be due to differences in the methods used and amount of material, rather than differences in the type of cells used. We will revise the discussion to make this point and plan to explain that our approach corresponds well with a previous publication pooling localised mRNA datasets in neurons (von Kugelgen & Chekulaeva 2021).

      von Kuegelgen & Chekulaeva, 2020 DOI: 10.1002/wrna.1590

      1. It is important to note the limitations of the study. For example, DeSeq2 is biased for highly expressed transcripts. How robust was the prediction for low abundance transcripts?

      The presented 1,700 transcripts were shortlisted based on their presence and expression level (TPM) in glial protrusions rather than their relative enrichment. Nevertheless, the reviewer makes a valid criticism of our use of DESeq2, where we compared enriched transcripts in glial and neuronal protrusions in Figure 1D. To address this point we will discuss this caveat in the relevant section.

      The issue raised regarding low abundance transcript prediction raises an important question: does the likelihood of localisation to cell extremities correlate with mRNA abundance? We have already partially addressed this point, since our analysis of the fraction of localised transcripts per expression level quantiles shows only limited correlation. To address this comment, we will add these results in the revised manuscript as a supplementary figure.

      1. The authors identify 1,700 transcripts that they classify as "predicted to be present" in the projections of the Drosophila PNS glia. This was based on the comparison to all the mammalian glial transcripts. Since the authors have access to a transcriptomic study from Perisynaptic Schwann cells (PSCs), the nonmyelinating glia associated with the NMJ isolated from mice; it would be more convincing to then validate the extent of overlap between Drosophila peripheral glial with the mammalian PSCs. This may reveal conserved features of localized transcripts in the PNS, particularly associated with the NMJ function.

      Thank you for the valuable suggestion. A similar point was also raised by [Reviewer #2 - Major point 2] to re-run our pipeline excluding the PSCs dataset and intersect with the PSC transcriptome post-hoc. Please see the above section for our detailed response.

      1. Fig 2: What is the extent of overlap between the translating fractions versus the localized fraction? It will be informative to perform the functional annotation of the translating glial transcripts as identified from Fig 1D.

      This is an interesting question. To address this point, we plan to: (i) compare transcripts that are translated vs. localised in glial protrusions, and (ii) perform functional annotation enrichment analysis on the translated fraction of genes.

      1. "We conclude predicted group of 1,700 are highly likely to be peripherally localized in Drosophila cytoplasmic glial projections". To validate their predictions, the authors test some of these candidates in only one glial cell type. It might be worthy to extend this for other differentially expressed genes localized in another glial type as well.

      The presented in vivo analyses made use of the repo-GAL4 driver, which is active in all glial subtypes, including subperineurial, perineurial and wrapping glia that make distal projection to the larval neuromuscular junction. We agree that subtype-specific analysis would be highly informative, but we believe this is outside the scope of the current work where we aimed to identify conserved localised transcriptomes across all glial subtypes. Nevertheless, to address the comment, we plan to further clarify our use of pan-glial repo-GAL4 driver in the Results and Method section of the revised manuscript.

      1. Figure 5: The authors perform KD of candidate transcripts to test the effect on synapse formation. However, these are KD with RNAi that spans across the entire cell. To make the claim about the importance of "target" RNA localization in glia stronger, ideally, they should disrupt the enrichment specifically in the glial protusions and test the impact on bouton formation. Do these three RNAs have any putative localization elements?

      We agree with the review, that we would ideally test the effect of disruption of mRNA localization (and therefore localised translation). However, we feel these experiments are beyond the scope of this current study, as they will require a long road of defining localisation signals that are small enough to disrupt without affecting other functions. To address this comment we will revise the Discussion section to mention those difficulties explicitly, and clarify the limitations of the approach used in our study for greater transparency.

      Reviewer #4

      Major points:

      1. The authors use FISH to validate the glial expression of their target genes, though these experiments are not quantified, and no controls are shown. The authors should provide a supplemental figure with "no probe" controls, and/or validate the specificity of the probe via glial knockdown of the target gene (see point 2). Furthermore, these data should be quantified (e.g. number of puncta colocalized with NMJ glia membrans).

      Thank you for requesting further information regarding the YFP smFISH probes. We have validated the specificity and sensitivity of the YFP probe in our recent publication (Titlow et al, 2023, Figure 1 and S1). Specifically, we demonstrated the lack of YFP probe signal from wild-type untagged biosamples and showed colocalization of YFP spots with additional probes targeting the endogenous exon of the transcript. Nevertheless, we will address this comment by adding control image panels of smFISH in wild-type (OrR) neuromuscular junction preparations.

      Titlow et al, 2023 DOI: 10.1083/jcb.202205129

      1. For the most part, the authors only use one RNAi line for their functional studies, and they only show data for one line, even if multiple were used. To rule out potential false negatives, the authors should leverage their FISH probes to show the efficacy of their knockdowns in glia. This would serve the dual purpose of validating the new probes (see point 1).

      Thank you for the suggestion. This point was also raised by [Reviewer #2 - Major point 3]. Please see above for our detailed response.

      1. In Figure 5 E, given the severe reduction in size in the stimulated Pdi KD animals, the authors should show images of the unstimulated nerve as well. Do the nerve terminals actually shrink in size in these animals following stimulation, rather than expand? The NMJ looks substantially smaller than a normal L3 NMJ, though their quantification of neurite size in F suggests they're normal until stimulation.

      We share the same interpretation of the data with the reviewer that the neurite area is reduced post-potassium stimulation in pdi knockdown animals. We will follow the reviewer’s suggestion and add an image showing unstimulated neuromuscular junctions.

      Minor points:

      1. The authors claim that there is an enrichment of ASD-related genes in their final list of ~1400 genes that are enriched in glial processes. It is well-appreciated that synaptically-localized mRNAs are generally linked to ASDs. Can the authors comment on whether the transcripts localized to glial processes are even more linked to ASDs and neurological disorders than transcripts known to be localized to neuronal processes?

      This is an interesting point. To address the comment, we will add a comparison of the degree of enrichment of ASD-related genes in neurite vs. glial protrusions in the revised manuscript.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1

      1. The use of blue/green or blue/green/magenta is difficult to resolve in some places. Swapping blue for cyan would greatly aid in visualizing their data.

      This comment is much appreciated. We have swapped blue for cyan in Figures 4 and S4. We have also changed Figure S1 to increase contrast and visibility as per reviewer’s comment.

      1. Make the colouring/formatting of the tables more consistent, its distracting when its constantly changing (also there is no need for a blue background.. just use a basic white table).

      This comment is much appreciated. We have applied a consistent colour palette to the Tables without background colourings and made the formatting uniform.

      Reviewer #2

      1. Introduction: 'Asymmetric mRNA localization is likely to be as important in glia, as it is in neurons,...'. Remove commas

      Thank you for pointing this mistake out. We have made the corresponding edits.

      Reviewer #3

      1. RNA localization in oligodendrocytes has been well studied and characterized. The authors should cite and discuss those papers (PMID: 18442491; PMID: 9281585).

      We thank the reviewer for this useful suggestion. We have added these references to the paper.

      Reviewer #4

      1. In Figure 5D, the authors should include a label to indicate that these images are from an unstimulated condition.

      We thank the reviewer for pointing this out. We have added the label as requested.

      1. The authors are missing a number of key citations for studies that have explored the functional significance of mRNA trafficking in glia, and those that have validated activity-dependent translation:

      - https://pubmed.ncbi.nlm.nih.gov/18490510/

      -https://pubmed.ncbi.nlm.nih.gov/7691830/

      -https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001053

      -https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7450274/

      -https://pubmed.ncbi.nlm.nih.gov/36261025**_/

      _**

      We thank the reviewer for the comment. We have added these references to the text.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      This article examines the cellular processes that predispose cells to nuclear blebbing and DNA damage in response to lamin and chromatin perturbations. The authors show key differences in these two types of perturbation and demonstrate a role for actin contractility. The experiments are well controlled and the data analysis generally rigorous. However, prior to acceptance, a number of issues must be fixed to improve the manuscript. I do not know the field sufficiently well to judge the novelty of the data.

      Major issues:

      • page 7, bottom: The authors state that measuring nuclear height gives an indication of confinement and force balance. But, if the nuclear mechanical properties have changed, then the nuclear height could change without any change in contractility. So, the authors would need to also verify that the level of contractility hasn't changed and that the mechanical properties haven't changed to really confirm that the cell height is a good measure of confinement. The level of contractility can be assessed by staining for pMLC. The nuclear mechanical properties may have been measured by others.
      • In general, are the changes in contractility resulting from drug treatments sufficiently large to deform the nucleus? Can the authors show a time course of nuclear height in response to a treatment for WT for example? This would allow to link contractility to nuclear height.
      • Page 9: The authors do not find any change in nuclear shape. Can they measure shape pre/post treatment on the same cells? It could be that the effect is lost in variability unless you do paired measurements?
      • Page 11: the authors find nuclear ruptures unchanged in LMA -/- even when there is no contractility. They then state: "We hypothesized that LMNA-/- nuclei do not show bleb-based behaviors because this perturbation cannot, due to reported disrupted nuclear-actin connections". I do not understand this sentence.
      • To characterise actin contractility better, it would be good to present images of the actin cables in each condition and pre/post treatments. This would allow to visually assess whether the morphology of the F-actin cytoskeleton has changed. This is one of the main topics of the study and as such it should be examined.
      • On all bar charts, the authors should indicate: the number of independent experiments, the number of cells examined.
      • I find the diagrams on Fig 1A, 2A etc do not help to illustrate what the authors think is happening. Can they redraw them in a more informative way?
      • The abstract, introduction, and discussion are overly long and lack focus. These should be rewritten succinctly.

      Minor issues:

      • page 4: inhibitors of Rho-kinase will also modulate actin polymerisation indirectly through the action on Lim-kinase and cofilin.
      • page 5, second paragraph: the authors should state that they are measuring the frequency of ruptures. At first, I thought this might be a mechanical strain.
      • Page 7: In general, it may be useful to discuss the temporal evolution of the c/n and the circularity side by side. The change in circularity over time could be an indicator of mechanical strain, while the c/n would report on any transient loss of integrity of the nuclear membrane.
      • Fig 1B: it would be nice to present the time course of the c/n as well.
      • Fig S1: it might be interesting to characterise the dynamics/amplitude of the c/n for the different conditions. There doesn't appear to be any difference between the nuclear blebbing rupture and the non blebbing rupture. This suggests that the two phenomena (nuclear blebbing and nuclear rupture) are independent: i.e. rupture is not causally linked to blebbing.

      Significance

      This article examines the cellular processes that predispose cells to nuclear blebbing and DNA damage in response to lamin and chromatin perturbations. The authors show key differences in these two types of perturbation and demonstrate a role for actin contractility. The experiments are well controlled and the data analysis generally rigorous. However, prior to acceptance, a number of issues must be fixed to improve the manuscript. I do not know the field sufficiently well to judge the novelty of the data.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary

      While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

      Strengths/weaknesses

      By and large, the analysis performed is dependent on their ability to identify HSlncRNAs and their DBS. I think that they have done a good job of showing the performance metrics of their methods in previous publications. Thereafter, they perform a series of enrichment-type analyses that have been used in the field for quite a while now to look at tissue-specific enrichment, or region-specific enrichment, or functional enrichment, and I think these have been carried out well. The authors achieved the aims of their work. I think one of the biggest contributions that this paper brings to the field is their annotation of these HSlncRNAs. Thus a major revisionary effort could be spent on applying their method to the latest genomes that have been released so that the community could get a clean annotation of newly identified HSlncRNAs (see comment 2).

      Comments

      1) Though some of their results about certain HSlncRNAs having DBSs in all genes is rather surprising/suspicious, I think that broadly their process to identify and validate DBSs is robust, they have multiple lines of checks to identify such regions, including functional validation. These predictions are bound to have some level of false positive/negative rate and it might be nice to restate those here and on what experiment/validation data these were conducted. However, the rest of their analysis comprises different types of enrichment analysis which shouldn't be affected by outlier HSlncRNAs if indeed their FPR/FNR are low.

      2) There are now several new genomes available as part of the Zoonomia consortium and 240 Primate consortium papers released. These papers have re-examined some annotations such as Human Accelerated Regions (HARs) and found with a larger dataset as well as better reference genomes, that a large fraction of HARs were actually incorrectly annotated - that is that they were also seen in other lineages outside of just the great apes. If these papers have not already examined HSlncRNAs, the authors should try and re-run the computational predictions with this updated set and then identify HSlncRNAs there. This might help to clarify their signal and remove lncRNAs that might be present in other primates but are somehow missing in the great apes. This might also help to mitigate some results that they see in section 3 of their paper in comparing DBS distances between archaics and humans.

      3) The differences between the archaic hominins in their DBS distances to modern humans are a bit concerning. At some level, we expect these to be roughly similar when examining African modern humans and perhaps the Denisovan being larger when examining Europeans and Asians, but they seem to have distances that aren't expected given the demography. In addition, from their text for section 3, they begin by stating that they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3. Explicitly stating which of the two distances in the text would be helpful for the reader.

      (1) According to Figure 1A (according actually to Meyer et al., 2012, Prufer et al., 2017, and Prüfer et al., 2013), the phylogenetic distance from modern humans to Denisovan is shorter than the distance to Altai Neanderthal. However, also according to these studies, the branch of Denisovan is more remote to modern humans than Altai Neanderthal. Thus, it is not unreasonable to find that 2514 and 1256 DBSs have distances > 0.034 in genes in Denisovans and Altai Neanderthals, respectively. Probably, both the phylogenetic distances and DBS distances depend considerably on the sampled genomes of Altai and Denisovan who lived on the earth for quite long. When new samples are obtained, these distances may be somewhat changed.

      (2) Regarding “they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3”, the second type of distances were discussed in section 3, and the distances computed in the first way were not further analyzed because “This defect may be caused by that the human ancestor was built using six primates without archaic humans”.

      4) Isn't the correct control to examine whether eQTLs are more enriched in HSlncRNA DBSs a set of transcription factor binding sites? I don't think using just promoter regions is a reasonable control here. This does not take away from the broader point however that eQTLs are found in DBSs and I think they can perform this alternate test.

      Indeed, the TFs-TFBSs and lncRNAs-DBSs relationships are comparable, and which one contains more QTLs is an interesting question. In this sense, it is reasonable to use TFBSs as the control. However, for three reasons, we did not perform the comparison and use TFBSs as the control. First, most TFBSs are predicted by varied methods, making us concern the reliability of comparing two sets of predictions. Second, most QTLs in DBSs are mQTLs but most QTLs in TFBSs are eQTLs. Third, probably a greater portion of TFBSs than DBSs are not in promoters, and the time consumption of LongTarget made us unable to predict DBSs truly genome-wide. Nevertheless, this is an interesting question deserving further exploring.

      5) In the discussion, they highlight the evolution of sugar intake, which I'm not sure is appropriate. This comes not from GO enrichment but rather from a few genes that are found at the tail of their distribution. While these signals may be real, the evolution of traits is often highly polygenic and they don't see this signal in their functional enrichment. I suggest removing that line. Moreover, HSlncRNAs are ones that are unique across a much longer time frame than the transition to agriculture which is when sugar intake rose greatly. Thus, it's unlikely to see enrichment for something that arose in the past 6000-7000 years would in the annotation that is designed to detect human-chimp or human-neanderthal level divergence.

      Multiple sugar metabolism-related pathways, including “glucose homeostasis” and “glucose metabolic process”, are found to be enriched only in Altai Neanderthal but not in chimpanzees (Figure 2). Indeed, HS lncRNAs are across a much longer time frame than the transition to agriculture. However, given that apes and monkeys know picking the ripe, sugar-rich fruits at the right time and place, we conjecture that archaic humans as hunter-gatherer could effectively explore natural sugars.

      Reviewer #2 (Public Review):

      Lin et al attempt to examine the role of lncRNAs in human evolution in this manuscript. They apply a suite of population genetics and functional genomics analyses that leverage existing data sets and public tools, some of which were previously built by the authors, who clearly have experience with lncRNA binding prediction. However, I worry that there is a lack of suitable methods and/or relevant controls at many points and that the interpretation is too quick to infer selection. While I don't doubt that lnc RNAs contribute to the evolution of modern humans, and certainly agree that this is a question worth asking, I think this paper would benefit from a more rigorous approach to tackling it.

      At this point, my suggestions are mostly focused on tightening and strengthening the methods; it is hard for me to predict the consequence of these changes on the results or their interpretation, but as a general rule I also encourage the authors to not over-interpret their conclusions in terms of what phenotype was selected for when as they do at certain points (eg glucose metabolism).

      I note some specific points that I think would benefit from more rigorous approaches, and suggest possible ways forward for these.

      1) Much of this work is focused on comparing DNA binding domains in human-unique long-noncoding RNAs and DNA binding sites across the promoters of genes in the human genome, and I think the authors can afford to be a bit more methodical/selective in their processing and filtering steps here. The article begins by searching for orthologues of human lncRNAs to arrive at a set of 66 human-specific lncRNAs, which are then characterised further through the rest of the manuscript. Line 99 describes a binding affinity metric used to separate strong DBS from weak DBS; the methods (line 432) describe this as being the product of the DBS or lncRNA length times the average Identity of the underlying TTSs. This multiplication, in fact, undoes the standardising value of averaging and introduces a clear relationship between the length of a region being tested and its overall score, which in turn is likely to bias all downstream inference, since a long lncRNA with poor average affinity can end up with a higher score than a short one with higher average affinity, and it's not quite clear to me what the biological interpretation of that should be. Why was this metric defined in this way?

      Length is an important metric of DBS, but it has a defect – a triplex of 100 bp may have 50% or 70% of nucleotides bound; in the two situations, the binding affinity of DBD and DBS is very different.

      2) There is also a strong assumption that identified sites will always be bound (line 100), which I disagree is well-supported by additional evidence (lines 109-125). The authors show that predicted NEAT1 and MALAT1 DBS overlap experimentally validated sites for NEAT1, MALAT1, and MEG3, but this is not done systematically, or genome-wide, so it's hard to know if the examples shown are representative, or a best-case scenario.

      More details are described in the citation Wen et al. 2022. We will put the sites into Supplementary Tables in the revised version.

      It's also not quite clear how overlapping promoters or TSS are treated - are these collapsed into a single instance when calculating genome-wide significance? If, eg, a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one? Since the interaction between the lncRNA and the DBS happens at the DNA level, it seems like not correcting for this uneven distribution of transcripts is likely to skew results, especially when testing against genome-wide distributions, eg in the results presented in sections 5 and 6. I do not think that comparing genes and transcripts putatively bound by the 40 HS lncRNAs to a random draw of 10,000 lncRNA/gene pairs drawn from the remaining ~13500 lncRNAs that are not HS is a fair comparison. Rather, it would be better to do many draws of 40 non-HS lncRNAs and determine an empirical null distribution that way, if possible actively controlling for the overall number of transcripts (also see the following point).

      (1) If, say, three transcripts of a gene share the same promoter region (i.e., they have the same TSS) but differ only in 3’UTR, the promoter region was used to predict DBSs just for once. Otherwise, if the three transcripts have different TSS, the three promoter regions were used to predict DBSs.

      (2) A gene may have many DBSs if it has many transcripts, or few ones if it has just a few transcripts. We did not correct for this uneven distribution of transcripts, because our GTEx analysis was on the transcript level; it is well recognized that transcripts of the same gene can be expressed in different tissues.

      (3) We randomly sampled a pair of non-HS lncRNA and a transcript for 10000 times (i.e., 10000 pairs). It is a point that multiple draws of 40 non-HS lncRNAs should be made to make the statistics more robust.

      3) Thresholds for statistical testing are not consistent, or always well justified. For instance, in line 142 GO testing is performed on the top 2000 genes (according to different rankings), but there's no description of the background regions used as controls anywhere, or of why 2000 genes were chosen as a good number to test? Why not 1000, or 500? Are the results overall robust to these (and other) thresholds? Then line 190 the threshold for downstream testing is now the top 20% of genes, etc. I am not opposed to different thresholds in principle, but they should be justified.

      The over-representation analysis using g:Profiler was performed taking the whole genome as the background. Analyzing more DBSs (especially weak DBSs) would generate more results, but the results could be less reliable. Thus, there is a trade-off between analyzing fewer DBSs with relatively high reliability and analyzing more DBSs with relatively low reliability. Inevitably, the handling of this trade-off is somewhat subjective, and to carefully compare the two classes of DBSs per can be an independent question. Although weak DBSs were not systematically analyzed, the results from the strong DBSs undoubtedly suggest that HS lncRNAs have contributed greatly to human evolution.

      Likewise, comparing Tajima's D values near promoters to genome-wide values is unfair, because promoters are known to be under strong evolutionary constraints relative to background regions; as such it is not surprising that the results of this comparison are significant. A fairer comparison would attempt to better match controls (eg to promoters without HS lncRNA DBS, which I realise may be nearly impossible), or generate empirical p-values via permutation or simulation.

      We examined Tajima’s D in DBSs (Supplementary Figure 9) and in HS lncRNA genes (Supplementary Figure 18). In both cases, we compared the Tajima’s D values with the genome-wide background.

      4) There are huge differences in the comparisons between the Vindija and Altai Neanderthal genomes that to me suggest some sort of technical bias or the such is at play here. e.g. line 190 reports 1256 genes to have a high distance between the Altai Neanderthal and modern humans, but only 134 Vindija genes reach the same cutoff of 0.034. The temporal separation between the two specimens does not seem sufficient to explain this difference, nor the difference between the Altai Denisovan and Neanderthal results (2514 genes for Denisovan), which makes me wonder if it is a technical artefact relating to the quality of the genome builds? It would be worth checking.

      We used the same workflow (and the same cutoff 0.034) to analyze Vindija and Altai Neanderthal and Denisovan. If a smaller cutoff was used, one would see more Vindija genes. The question again is that there is a trade-off. Analyzing epigenome and epigenetic regulation in archaic genomes is an interesting direction, and much more studies are needed before more reasonably setting related parameters and cutoffs.

      5) Inferring evolution: There are some points of the manuscript where the authors are quick to infer positive selection. I would caution that GTEx contains a lot of different brain tissues, thus finding a brain eQTL is a lot easier than finding a liver eQTL, just because there are more opportunities for it. Likewise, claims in the text and in Tables 1 and 2 about the evolutionary pressures underlying specific genes should be more carefully stated. The same is true when the authors observe high Fst between groups (line 515), which is only one possible cause of high Fst - population differentiation and drift are just as capable of giving rise to it, especially at small sample sizes.

    2. Reviewer #1 (Public Review):

      Summary<br /> While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

      Strengths/weaknesses<br /> By and large, the analysis performed is dependent on their ability to identify HSlncRNAs and their DBS. I think that they have done a good job of showing the performance metrics of their methods in previous publications. Thereafter, they perform a series of enrichment-type analyses that have been used in the field for quite a while now to look at tissue-specific enrichment, or region-specific enrichment, or functional enrichment, and I think these have been carried out well. The authors achieved the aims of their work. I think one of the biggest contributions that this paper brings to the field is their annotation of these HSlncRNAs. Thus a major revisionary effort could be spent on applying their method to the latest genomes that have been released so that the community could get a clean annotation of newly identified HSlncRNAs (see comment 2).

      Comments<br /> 1) Though some of their results about certain HSlncRNAs having DBSs in all genes is rather surprising/suspicious, I think that broadly their process to identify and validate DBSs is robust, they have multiple lines of checks to identify such regions, including functional validation. These predictions are bound to have some level of false positive/negative rate and it might be nice to restate those here and on what experiment/validation data these were conducted. However, the rest of their analysis comprises different types of enrichment analysis which shouldn't be affected by outlier HSlncRNAs if indeed their FPR/FNR are low.

      2) There are now several new genomes available as part of the Zoonomia consortium and 240 Primate consortium papers released. These papers have re-examined some annotations such as Human Accelerated Regions (HARs) and found with a larger dataset as well as better reference genomes, that a large fraction of HARs were actually incorrectly annotated - that is that they were also seen in other lineages outside of just the great apes. If these papers have not already examined HSlncRNAs, the authors should try and re-run the computational predictions with this updated set and then identify HSlncRNAs there. This might help to clarify their signal and remove lncRNAs that might be present in other primates but are somehow missing in the great apes. This might also help to mitigate some results that they see in section 3 of their paper in comparing DBS distances between archaics and humans.

      3) The differences between the archaic hominins in their DBS distances to modern humans are a bit concerning. At some level, we expect these to be roughly similar when examining African modern humans and perhaps the Denisovan being larger when examining Europeans and Asians, but they seem to have distances that aren't expected given the demography. In addition, from their text for section 3, they begin by stating that they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3. Explicitly stating which of the two distances in the text would be helpful for the reader.

      4) Isn't the correct control to examine whether eQTLs are more enriched in HSlncRNA DBSs a set of transcription factor binding sites? I don't think using just promoter regions is a reasonable control here. This does not take away from the broader point however that eQTLs are found in DBSs and I think they can perform this alternate test.

      5) In the discussion, they highlight the evolution of sugar intake, which I'm not sure is appropriate. This comes not from GO enrichment but rather from a few genes that are found at the tail of their distribution. While these signals may be real, the evolution of traits is often highly polygenic and they don't see this signal in their functional enrichment. I suggest removing that line. Moreover, HSlncRNAs are ones that are unique across a much longer time frame than the transition to agriculture which is when sugar intake rose greatly. Thus, it's unlikely to see enrichment for something that arose in the past 6000-7000 years would in the annotation that is designed to detect human-chimp or human-neanderthal level divergence.

    1. Author Response

      The primary concern of Reviewer 1 is that Ne might affect gBGC and hence GC, and this might act as a confounding effect. The reviewer suggests that we should investigate how gBGC (with GC presumably as its proxy) might affect CAIS, and to what extent any relationship here could explain the relationship between CAIS and body mass. We believe that we have already dealt with this both in Supplementary Figure S5A (where we regret having inserted the wrong figure panel, a mistake we will correct), and its PIC-corrected counterpart in S5B. These two panels show (or will show) that CAIS is not correlated with GC. Note that we expect our genomic-GC-based codon usage expectations to reflect unchecked gBGC in an average genomic region, independently of whether that species has high or low Ne. Our working model is that mutation biases, including but not limited to the strength of gBGC, vary among species, and that they rather than selection determine each species’ genome-wide %GC. By correcting for genome-wide %GC, our CAIS thus corrects for mutation bias, in order to isolate the effects of selection.

      Reviewer 1 also suggests that we examine the relationship between gene expression and GC corrected RSCU, as we would expect codon adaptation to be stronger in more highly expressed genes, as was previously shown in the non-GC corrected CAI metric (Sharp et al 1987). Correlations with gene expression are outside the scope of the current work, which is focused on producing a single value of codon adaptation per species. It is indeed possible that our general approach could be useful in future work investigating differences among genes.

      One key difference between our work and that of Galtier et al. 2018 is that our approach does not rely on identifying specific codon preferences per species. Our approach thus remains appropriate even for scenarios e.g. where different cell types, different environmental conditions, and/or different genes have different codon preferences (Gingold et al. 2014 https://doi.org/10.1016/j.cell.2014.08.011). At a high level, our results are in broad agreement with those of Galtier et al., 2018, who found that gBGC affected all animal species, regardless of Ne, and who like us, found that the degree of selection on codon usage depended on Ne. Through use of a more sensitive methodology, we believe we have expanded our ability to detect codon adaptation into animals of somewhat higher Ne than in previous work.

      We thank Reviewer 2 for explicitly laying out the math that was implicit in our Figures 1 and 2. In our revisions, we will more clearly acknowledge that the per-site codon adaptation bias depicted in Figure 1 has limited sensitivity to s*Ne. We believe our approach worked despite this because the phenomenon is driven by what is shown in Figure 2. I.e., where Ne makes a difference is by determining the proteome-wide fraction of codons subject to significant codon adaptation, rather than by determining the strength of codon adaptation at any particular site or gene.

      Simulated datasets would be great, but we think it a nice addition rather than must-have, in particular because we are skeptical about whether our understanding of all relevant processes is good enough such that simulations would add much to our more heuristic argument along the lines of Figure 2. E.g. we believe the complications documented by Gingold et al. 2014 cited above are pertinent, but incorporating them into simulations would require a complex set of assumptions.

      In response to the final comment of reviewer 2, the reason that we hard-coded genome-wide %GC values is that we took them from the previous study of James et al. (2023) https://doi.org/10.1093/molbev/msad073. As summarized in the manuscript, genome-wide %GC was a byproduct of a scan conducted in that work, of all six reading frames across genic and intergenic sequences available from NCBI with access dates between May and July 2019. The code used in the current work to calculate the intergenic %GC, as well as that used to calculate amino acid frequencies, is located at https://github.com/MaselLab/Codon-Adaptation-Index-of-Species. We agree that more user-friendly tools would be useful, but producing robust tools falls outside the scope of the current manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      “Liu et al present a very interesting manuscript investigating whether there are distinct mechanisms of learning in children with ASD. What they found was that children with ASD showed comparable learning to typically developing children, but that there was a difference in learning strategy, with less plasticity and more stable learning representations in children with ASD. In other words, children with ASD showed similar learning performance to typically developing children but were more likely to use different learning rules to get there. Interestingly greater fMRI-measured brain plasticity was associated with learning gains in typically developing children, whereas more stable (less plasticity) neural patterns were associated with learning gains in autistic children. This was mediated by insistence on sameness (from the RRIB) in the ASD group. This is a good paper, well reasoned and with strong methods.”

      We appreciate the positive comments from the reviewer.

      1.1) “The biggest issue is related to subject numbers...With n=35 it is only possible to make a generalized statement about autism.”

      Thank you for this comment. Although the sample size in the current study was modest, we would like to note that acquiring high-quality behavioral and brain imaging data at multiple time points a is a challenge in children with ASD. The current training study with unique longitudinal behavioral and brain imaging data provides an unprecedented opportunity to investigate the potentially atypical training-induced learning and brain plasticity in children with ASD relative to TD peers. To our knowledge, the present longitudinal sample is largest of its kind in studies of neurocognitive function in children with ASD. We have acknowledged these points in the revised Discussion section (Page 15), including the following statement:

      “First, larger sample sizes are required to further characterize heterogeneous patterns of atypical learning and whether the findings can be generalized to a broader ASD population.” (Page 15)

      1.2) “[Another] issue is related to [heterogeneity of autism-related findings]. For example, take the following statement from the results: "while most TD children used the memory-based strategy most frequently following training, nearly half of the children with ASD used rule-based strategies most frequently for trained problems." Is this the heterogeneity of autism at play, or the noisiness of the task and measures?

      We hypothesize that group differences in changes in strategy use following training are due to atypical learning style or high level of inter-individual differences, i.e., greater heterogeneity, in autism, rather than noisiness of the measures. This hypothesis is based on the fact that we used the same tasks before and after training and a standardized training protocol across the two groups, which (i) allowed us to systemically examine atypical learning of these tasks in children with ASD compared to TD children and (ii) provided ecologically valid measures. This design minimized potential differences in measurement error between the two groups. We have clarified these points in the revised Introduction section (Page 4), including the following statement: “Crucially, we employed identical tasks before and after training and a standardized training protocol across the two groups. This approach enabled systemic analysis of learning in children with ASD relative to TD children.” (Page 4)

      1.3) “Conceptually, is it realistic to expect a unitary learning strategy in all of autism?

      We agree with the sentiment expressed by the reviewer, and indeed this notion led to the hypothesis that our study was to test. We hypothesized that children with ASD would not show a unitary learning strategy at this stage of development examined. Our results reveal that a disproportionate number of children with ASD use a rule-based strategy, reflecting atypical learning styles.

      1.4) “Lastly, the task itself can only be solved in a subset of autistic children and therefore presents a limited view of the condition.”

      We thank the reviewer for this important point and agree that additional studies tailored to more severely affected children with ASD are required for a more comprehensive characterization of learning in children with autism.

      Reviewer #2 (Public Review):

      “Overall, the authors sought to determine whether children with autism spectrum disorder (ASD) or typical development (TD) would both benefit from a 5-day intervention designed to improve numerical problem-solving. They were particularly interested in how learning across training would be associated with pre-post intervention changes in brain activity, measured with functional magnetic resonance imaging (fMRI). They also examined whether brain-behavior associations driven by learning might be moderated by a classic cognitive inflexibility symptom in ASD ("insistence on sameness"). The study is reasonably well-powered, uses a 5-day evidence-based intervention, and uses a multivariate correlation-based metric for examining neuroplastic changes that may be less susceptible to random variation over time than conventional mass univariate fMRI analyses. The study did have some weaknesses that draw into question the specific claims made based on the present set of analyses, as well as limit the generalizability of the findings to the significant proportion of individuals with ASD that are outside of the normative range of general cognitive functioning. The study also found minimal evidence for transfer between trained and untrained mathematical problems, limiting enthusiasm for the intervention itself. The majority of the authors' claims were rooted in the data and the team was generally able to accomplish their aims. I am sensitive to the fact that one of the main limitations I noted would have significant ethical implications-i.e. NOT offering potentially beneficial numerical training to children randomized to a sham or control group. I think the authors' work will represent a welcome addition to a growing corpus of studies showing similar neuropsychological test performance across several cognitive domains (e.g. learning, memory, proactive cognitive control, etc.) in ASD and TD. However, these relatively preserved cognitive functions still appear to be implemented by unique neural systems and demonstrate unique correlations to clinical symptoms in youth with ASD relative to TD, which may have implications for both educational and clinical contexts.

      We thank the reviewer for the positive feedback and helpful suggestions.

      Reviewer #3 (Public Review):

      “Liu and colleagues examined learning and brain plasticity in neurotypical children and children with autism. The main findings include autistic children relying more on rule-based versus memory-based learning strategies, altered associations between learning gains and brain plasticity in children with autism, and insistence on sameness as a moderator between brain plasticity and learning in autism. Although the sample size is limited in this study, the findings provide a significant contribution to the field. The major strengths of this paper include an extensive pre and post training protocol, a detailed methods section, rationale behind the study, investigation of a potential moderator of learning gains and neural plasticity, and investigation of "neural plasticity" in association to learning in autism. Weaknesses of the study include a small sample size, and some missing information/analyses from the study. The authors laid out four clear aims of the study. They investigated these aims and the analytic approaches were appropriate. The paper included significant findings toward better understanding the mechanisms underlying differences in learning strategies and behavior in children diagnosed with autism spectrum disorder. This holds significant value in educational and classroom settings. Further, the investigation of a potential moderator of learning gains and neural plasticity provides a potential mechanism to improve the relationship. Overall, this is a significant contribution to the field. The autism literature is limited in understanding differences in learning styles and the underlying neural mechanisms of these differences.”

      We thank the reviewer for the positive comments and detailed suggestions.

    1. While it may be obvious that there are specific technologies for those with different abilities that help them engage with their learning, never forget that how we choose existing learning technologies is probably the first step in ensuring access to our learners, and potentially presenting barriers to their learning. Learning Management Systems (LMSs) like Moodle, Canvas, Blackboard Learn, D2L Brightspace, Google Classroom and other technologies should have accessibility features built in as well – if they don’t, these foundational systems will present barriers for our learners. If we’re choosing to use ad-hoc or additional technologies that sit outside what our institutions have set up for us (e.g., Kahoot, Canva, etc.) it’s up to us to assess what technologies we use for accessibility.

      The key takeaway I think

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements

      We were naturally pleased to read the enthusiasm coming from both reviewers. Both mentioned that an extension to experimentation in cells would increase the impact of the study, even though both recognize that the biophysical and biochemical experiments constitute a study that is significant and interesting to a broad readership.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript by Bryan et al., describes the use of Hydrogen/Deuterium-exchange Mass Spectrometry (HXMS) as a powerful tool to identify key amino acid residues and associated interactions driving liquid-liquid demixing. They have particularly focused on the Chromosomal Passenger Complex (CPC), an important regulator of chromosome segregation, which has recently been shown to undergo liquid-liquid demixing in vitro. Their work presented here allowed them to identify a few key electrostatic interactions as molecular determinants driving the liquid-liquid demixing of the CPC. Their work also shows that crystal packing information of protein molecules, where available, can provide valuable insight into likely factors driving liquid-liquid demixing.

      Major comments:

      [#1] A previous study by Trivedi et al., NCB 2019 identified an unstructured region in Borealin (aa residues 139-160) as the main region driving the phase separation of CPC. Interestingly, this region only shows a moderate reduction in HX upon liquid-liquid demixing. But no experiments or discussions related to this observation are presented in the current version of the manuscript.

      In the Trivedi et al. paper, the authors were careful to state that the region of borealin between 139-160 contributed to phase separation, but there was clearly a remaining propensity to phase separate in vitro in the mutant. Thus, it is fully expected that there should be other regions in the complex that contribute to phase separation. It was satisfying that this region was independently identified in the hydrogen-deuterium exchange experiments and we suggest that a “moderate” reduction is consistent with a protein condensate having liquid properties. Since this region was already characterized we have focused our work in this paper to the new region identified by the hydrogen-deuterium exchange experiments.

      [#2] In the absence of cellular data on if and how these mutations (within the triple-helical bundle region) affect CPC's ability to phase separate in cells, the implication of this work is very limited - One can't say for sure these are interactions driving phase separation of CPC in a cellular environment. In the absence of any cellular data with the mutants described here, much of the discussion on the possible roles of CPC phase separation in cells does not appear relevant to this manuscript. I would suggest that the authors focus mainly on highlighting the power of using HXMS as a tool to characterise the molecular determinants of liquid-liquid demixing at a relatively high resolution.

      We have now added cellular data in the form of one of the key experiments used to explore CPC liquid-liquid demixing utilizing the Cry2 optogenetic system for inducible dimerization. The results of testing WT Borealin versus the mutant we identified is defective in droplet formation are shown in the all new Fig. 6. Some relation of our overall findings, encompassing observations made with purified components and now in cells, to the cellular function of the CPC is pertinent. In light of the reviewer comments, we have also reduced this aspect in the discussion (see the substantial edits on pg. 12).

      Minor comments:

      [#3] The authors should ensure that the introduction cites relevant literature thoroughly. For example, where the potential role of Borealin residues 139-160 in conferring phase separation properties to the CPC is mentioned, the authors failed to cite Abad et al., 2019, which showed the contribution of the same Borealin region in conferring nucleosome binding ability to the CPC.

      We have made this particular change on pg. 4 and also have gone through to ensure we are appropriately citing relevant literature.

      Reviewer #1 (Significance (Required)):

      This is a highly relevant and significant work, particularly considering the rapidly growing list of examples for Phase separation of proteins/protein assemblies and their potential biological roles (in spite of ongoing debates in the field about the cellular relevance of several phase separation claims). The data presented in this manuscript are solid and convincingly establish HXMS as a useful tool to characterise molecular interactions driving liquid-liquid demixing. Considering its applicability to characterise wide-ranging protein assemblies implicated in phase separation, this work will be of interest to a broad readership.

      We thank the reviewer for the strong praise of the significance of our study.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, using the technique of hydrogen/deuterium-exchange mass spectrometry (HXMS), the authors have tried to gain insights into the structure of the chromosomal passenger complex (CPC) within the phase separated chromatin body, known to regulate chromosome segregation in mitosis. The CPC phase separated compartment comprises three regulatory and targeting subunits, INCENP, Survivin, and Borealin, forming a three-helix bundle hetero-trimer. By measuring changes in the polypeptide backbone dynamics of this trimeric INCENP/Survivin/Borealin complex, in the liquid-liquid de-mixed state in comparison to its soluble state, using HXMS measurements, the paper puts forward high-resolution structural details of the phase separated CPC. Using a step-wise mutagenesis approach in conjunction with the information from HXMS measurements and previous crystallographic data, this work also identifies distinct regions/interfaces within this complex harboring crucial salt bridges, which directly contribute toward the liquid-liquid demixing of the CPC. Comments: 1) "The three non-catalytic subunits of the CPC (INCENP1-58, Borealin, and Survivin) form soluble homotrimers that have a propensity to undergo liquid-liquid phase separation.8 " Do the authors mean the hetero-trimeric CPC?

      Yes, we meant heterotrimers. It is now corrected.

      2) For better clarity, the authors can indicate the residue numbers of each of the components INCENP, Survivin, and Borealin in the CPC trimeric helix-bundle crystallographic structure in Fig 1.

      These are included on the revised Figure 1A.

      3) "In the condition we identified, 90% +/- 5% of the ISB protein was found within the rapidly sedimenting droplet population (Fig. 1C)." The authors should include the time-point corresponding to the gel shown in Fig 1C.

      This information is now directly labeled in Fig. 1C.

      4) Prior to the HXMS experiments on the phase-separated ISB protein complex, were the samples subjected to sedimentation to separate the dispersed from the condensed droplet phase? Since several time points after formation of phase-separated ISB complex have been characterized to compare and contrast between the dispersed and the droplet phase, the authors can consider performing a time-dependent sedimentation assay to ascertain the fraction of the ISB complex in the droplet phase.

      The HXMS experiments were not performed on sedimented samples, so this complication in our HX workflow is not necessary. We note that the sedimentation that we include in our study (Figs. 1C, 5E, and S6), involves centrifugation for 10 minutes, and that length of time presents a substantial design challenge to our HX experimentation. We considered it at the outset of our study, but, in the end, our study was facilitated by our finding early on that this separation step was unnecessary. Further, we note that we report statistically significant differences at the earliest HX timepoints in the areas prominently protected from HX upon droplet formation (10 and 100 s; see Fig. 1C for an example). Indeed, we do not observe broadening of our HXMS spectra (examples shown for all timepoints, Fig. 2B,F) that would be expected if there were a large degree of mixed states (i.e. a large population of molecules in the free protein state and a large population of molecules in the droplet state) each having different HXMS rates. One can imagine that this sort of envelope broadening behavior (“EX1-like”) could be observed in other samples where there are multiple substantially populated states of a protein present at a particular timepoint, but this is not what we observe in the experiments we performed in this study.

      5) "At the 100 s timepoint, the most prominent differences between the soluble and droplet state were located within the three-helix bundle of the ISB, with long stretches in two subunits (INCENP and Borealin) and a small region at the N-terminal portion of the impacted a-helix in Survivin (Fig. 1F)" According to Fig 1F, at the 100 s time-point, there is also another small region in Survivin (approximately residues 12-20) that exhibits slower exchange rates in the droplet state. Can the authors comment on whether this region undergoes any conformational change or if it exhibits homotypic interactions retarding the hydrogen/deuterium exchange rates in the droplet phase?

      Our general approach in the Black lab over the past decade-plus of HXMS has been to restrict our conclusions whenever practical to do so to the consensus behavior. This permits multiple partially overlapping peptides to be used to generate confidence in the changes that drive our conclusions. The reviewer carefully recognizes the behavior of a single peptide (in 2 different charge states) that might have actual changes relative to some of the longer peptides that it partially overlaps with, and smaller changes can yield larger percentage changes on small peptides. We have chosen to not include this single peptide in the text describing our main conclusions from the work to be consistent with our longstanding strategy for rigorous interpretation of HXMS data. Our conclusion is that this region of not substantially changed upon droplet formation.

      6) The authors mention that: "By the latest timepoint, 3000 s, there was some diminution in the number of droplets which may indicate the start of a transition of the droplets to a more solid state (i.e., gel-like)." As a result of this time points beyond 3000 s have not been used for comparing Hydrogen/Deuterium exchange rates in the condensed droplet phase with the soluble state. Can the authors comment on what happens to the nature of these specific interactions between the components of the CPC in the 'gel-like state'? A combination of both non-specific weak interactions as well as strong site-specific interactions between macromolecular components has been widely known to contribute towards the formation of several phase-separated compartments. It will be interesting to know the perspective of the authors on what sort of interactions get populated within these compartments to give rise to a more solid gel-like state. At this later time points, do the droplets exhibit reversibility under higher ionic strength conditions? Do the authors have some data to show how the material property of these droplets evolve as a function of time?

      We offered the idea of a transition to a more solid state to the reader because it was a reasonable conclusion, although challenging to prove (something the Stukenberg lab is actively working on, though, see our response to point #9, below). The vast majority of our conclusions in the paper, and essentially all of what we emphasize are the important ones, are based on earlier timepoints where this is not an issue. Thus, we find an extended study of the late-developing features in our droplets something more appropriate for separate studies outside the scope of the current one.

      7) "Examination of the entire time course shows that during intermediate levels of HX (i.e., between 100-1000 s), this region takes about three times as long to undergo the same amount of exchange when the ISB is in the droplet state relative to when it's in the free protein state (Figs. 2B, C and Supplemental Fig. 2). Upon droplet formation, HX protection within Borealin is primarily located in the interacting a-helix and is less pronounced at any given peptide when compared to INCENP peptides (Fig. 2E). Nonetheless, similar to INCENP peptides, it still takes about twice as long to achieve the same level of deuteration for this region of Borealin in the droplet state as compared to the free state." How do the hydrogen/deuterium exchange rates and extent of deuteration in the N-terminal part (residues 98-142) of the Survivin polypeptide chain, constituting the three-helix bundle core, evolve as a function of time? Also, how do the exchange rates for peptides in this region compare with those of the other protein subunits Borealin and INCENP and what inference can be drawn from these differences?

      The peptides from a.a. 98-142 of Survivin exhibit HX protection through the timecourse (and before and after droplet formation) consistent with a folded a-helix (and comparable to the overall HX behavior of the other helices in the 3-helix bundle of the ISB)(Fig. S2). There is subtly slower HX in the droplet state for this region at later timepoints for this portion of Survivin (Fig. S4), and this is explicitly highlighted in the Results section on pg. 6.

      8) The authors mention that mutating either all the glutamate residues or combinations of these residues on the acidic patch on the INCENP subunit, to positively charged residues, causes a decrease in the propensity of phase separation, as formation of salt bridges with Borealin subunit from adjacent hetero-trimeric complexes appears to be the major driving force for phase separation. Can the authors elaborate on how the reduction in the phase separation propensity of these salt-bridge inhibiting mutants might be directly affecting the subsequent localization of the CPC to the inner centromeres? Can the authors supplement their existing in vitro data with further in vivo characterization of CPC recruitment or localization to the centromeres, for each of the constructs exhibiting reduced propensity of phase separation?

      As we state in the introduction, the recruitment to centromeres requires established ‘conventional’ targeting via the specific histone marks to which we refer. We also cite the correlations demonstrated between prior mutations in Borealin (impacting aa 139-160) that both disrupt phase separation in vitro and reduce CPC levels at the centromere. In our revision, we have added what we feel are the most critical cell-based experiments to relate to our HX studies in the new Fig. 6. We are preparing for future studies to study mutants arising from our HX studies, and our plans are to pursue gene replacement approaches that will rigorously test the impact on the mitotic function of the CPC. In the process of these future studies, the impact on localization will be measured, too. As others in the field are investigating the correlations between observations made with purified components and those made in the cell, and where there are nuances at play in how the actual experiments are conducted, we are certain our cell-based studies will extend far beyond the timeframe appropriate for our HX-focused study. Rigorous cell-based studies of mitotic functions are what is needed, however, and we have made our plans with that in mind.

      9) It might be really interesting for the authors to look at the recent preprint from Hedtfeld et al. 2023 Molecular Cell, (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4472737). In this preprint they have recombinantly purified a stoichiometric trimer (referred to as CPC-TARGWT) comprising full length survivin, borealin, and a 1-350 residue fragment of INCENP (instead of 1-58 used in this study) and have tried to assess if any correlation exists between the in-vitro phase behaviour of CPC-TARGWT mutants and their corresponding recruitment to the inner centromere, to form a phase separated compartment. Targeting residues in the BIR domain of Survivin involved in interactions with the N-terminus of the Histone H3, Shugosin 1 or in the recognition of H3T3phos, and substituting them with Alanine or completely deleting C-terminal domain of Borealin (a region implicated in CPC dimerization and centromere recruitment), was found to result in poor centromere localization, although the in vitro phase separation properties of these constructs were found to be indistinguishable, suggesting no evident correlation between the two phenomena. Thus it might be a useful piece of data to correlate the phase separation propensities of the ISB complex variants used in this current study with the extents of their in vivo recruitment to the inner centromere. This maybe beyond the scope of the paper, but it would be good to comment on this.

      For the correlation studies, please refer to our response to point #8, above. From our reading of the June 2023 preprint that the reviewer mentions, the main concern raised by the authors is questioning whether the region first identified in the Trivedi et al paper in Borealin (aa 139-160) has a role in phase separation. As the reviewer noted, Hedtfeld et al report using a complex that includes more of the INCENP protein than used in the Trivedi et al study, complicating the direct comparison between studies. Using the data in figure 5E of the Hedtfeld et al preprint, the authors suggest that the condensate formation of their version of the Borealin mutant D139-160 in vitro complex has similar phase separation properties as the wild type. However, we note that in our inspection of these data we see numerous differences. The mutant forms rounder, and larger condensates than WT and have reduced concentration of protein (less bright intensity). Finally only the WT protein has a “grape bunch” morphology. We note that unpublished data in the Stukenberg lab show these same differences can represent a defect in liquid demixing properties of a version of the purified CPC. While it is intuitive that larger condensates represent more phase separation, the unpublished data mentioned above suggests the opposite is true for the CPC. In particular, the data from the Stukenberg lab suggest the size of a droplet is mostly governed by the amount of droplet fusion in the first minutes after dilution and thus is limited by relatively rapid hardening of the complex. We note that in the course of discussions with the corresponding author of the preprint mentioned by the reviewer we did apprise them of the unpublished observations mentioned, above, in case they saw fit to include in their ongoing studies what would seem to be critical measurements (e.g. measuring circularity, droplet size, droplet intensity, and FRAP) to assess our suspicion that their construct contains a portion of INCENP that can accelerate condensate formation. If true, the Hedtfeld et al data are fully consistent with the Borealin mutant D139-160 having a significant condensate formation potential than the WT protein.

      10[A]) "Our data also provide an important clue about the previously identified region on Borealin that is required for liquid-demixing in vitro and proper CPC assembly in cells 8. Specifically, our data (Fig. 1F, Supplementary Figs. 2, 4A) suggest this region of Borealin adopts secondary structure that undergoes additional HX protection in the liquid-liquid demixed state" This data fits perfectly with previous studies from Trivedi et al. (2019), which states that deletion of the Borealin 139-160 fragment obliterates its phase separation in vitro and also reduces the accumulation of CPC at the centromere. On the contrary, in the recent preprint from Hedtfeld et al. 2023 Molecular Cell, they have shown that the phase separation behaviour of their reconstituted CPC-TARGWT harboring the Borealin 139-160 deletion mutant was found to be indistinguishable from the WT. Can the authors comment on what might be the reason for this difference? Is it possible that this central Borealin region is involved in interactions with the additional fragment of INCENP subunit used in the helical bundle reconstitution, or with other centromere component proteins, whereby the deletion of region is causing inefficient recruitment to the inner centromere? This can be elaborated in the discussion section of the manuscript.

      This is discussed in the response to #9, above. Through this format (the Review Commons procedure for public posting of author responses before submission of the study to a journal), our comments herein will be made public for those with the most interest in comparing our data to what is has been posted on preprint servers. We think that is the most appropriate for now, with more to surely come when the aforementioned results from the Stukenberg lab are posted/published and, hopefully when there is more information about the nature of the droplets reported in the Hedtfeld et al., study.

      10 [B]) It is also well known that in addition to these electrostatic interactions, the core of the ISB helical bundle is formed by an extensive network of hydrophobic interactions. Have the authors ever looked into how perturbing any of these intra-trimeric complex hydrophobic interactions affect their ability to phase separate and perform their subsequent function?

      We think there is some confusion, here. The electrostatics we focus on are between heterotrimers rather than within them. We certainly would predict that disrupting the hydrophobic surface that generates a stable heterotrimer would, in turn, disrupt individual heterotrimers. Our study assumes a stable heterotrimer as a starting point, so we view this type of perturbation as unrelated to our conclusions.

      11) The phase separated CPC compartment is known to enrich several other inner centromere proteins such as the Histone H3, Sgo1, the histone H3T3phos, among others. Have the authors tried to increase the complexity of the reconstituted CPC scaffold by incorporating more components to look into whether that changes any of the interaction interfaces between the ISB trimeric complexes within the condensed phase? Can this CPC compartment be reconstituted using a bottom-up approach?

      We are glad that our studies with a reductionist biochemical reconstitution approach have inspired the questions that require increased complexity. They are now warranted based on the advance we have made in the present study, and hopefully will form the basis for future, separate studies.

      Overall, this paper brings forward a useful technique to probe the conformational landscape of proteins in the condensed droplet phase and compare it with its dispersed phase. This paper serves as an interesting read showing how specific salt-bridge interactions between multiple stoichiometric protein complexes can be the driving force for phase separation.

      Reviewer #2 (Significance (Required)):

      In this manuscript, using the technique of hydrogen/deuterium-exchange mass spectrometry (HXMS), the authors have tried to gain insights into the structure of the chromosomal passenger complex (CPC) within the phase separated chromatin body, known to regulate chromosome segregation in mitosis. The CPC phase separated compartment comprises three regulatory and targeting subunits, INCENP, Survivin, and Borealin, forming a three-helix bundle hetero-trimer. By measuring changes in the polypeptide backbone dynamics of this trimeric INCENP/Survivin/Borealin complex, in the liquid-liquid de-mixed state in comparison to its soluble state, using HXMS measurements, the paper puts forward high-resolution structural details of the phase separated CPC. Using a step-wise mutagenesis approach in conjunction with the information from HXMS measurements and previous crystallographic data, this work also identifies distinct regions/interfaces within this complex harboring crucial salt bridges, which directly contribute toward the liquid-liquid demixing of the CPC.

      Overall, this paper brings forward a useful technique to probe the conformational landscape of proteins in the condensed droplet phase and compare it with its dispersed phase. This paper serves as an interesting read showing how specific salt-bridge interactions between multiple stoichiometric protein complexes can be the driving force for phase separation

      We thank the reviewer for the positive comments on the significance of our study.

    1. Residents crossing between islands during a rising tide on Majuro, Marshall Islands, in 2015. Majuro is home to former residents of Bikini Atoll who were relocated in the 1940s.Credit...Josh Haner/The New York TimesBy Pete McKenzieMay 3, 2023The golden sand of Bikini Atoll is laced with plutonium. The freshwater is poisoned with strontium. The coconut crabs contain hazardous levels of cesium.In the 1940s and ’50s, the U.S. government used this coral reef, in the Pacific nation of the Marshall Islands, for testing nuclear weapons. Radioactive residue has left Bikini uninhabitable to this day, forcing those whose families once lived on the atoll into exile on a handful of other Marshallese islands and in the United States.Recognizing the damage its testing caused, the U.S. government established two trust funds in the 1980s to help pay for Bikinians’ health care, build housing and cover living costs. In 2017, after a campaign by Bikini leaders for greater autonomy, the Trump administration announced that the government would lift withdrawal limits and stop auditing the main fund, then worth $59 million.Six years later, only about $100,000 remains, and the Bikini community is in crisis.Anderson Jibas, the mayor of the council that oversees the displaced Bikini community, made a series of questionable purchases on Bikini’s behalf, including of a large plot of land in Hawaii and a fleet of new vehicles. He has defended some of the purchases as investments against climate change, as necessary to support isolated Bikinians and as attempts at revenue-generating projects.AdvertisementSKIP ADVERTISEMENTMr. Jibas has also acknowledged using trust fund money for personal expenses and has been accused by a top Marshall Islands official of receiving kickbacks from an investment manager — a charge Mr. Jibas denies.ImageA U.S nuclear bomb test at Bikini Atoll in 1946.Credit...Universal Images Group, via Getty ImagesWith the fund virtually depleted, the council’s roughly 350 employees are no longer being paid. Monthly payments of about $150 each to the community’s 6,800 members — a vital lifeline that helped cover food and rent among a population with high rates of poverty — have ceased.The emergency highlights the lasting consequences of decades of U.S. nuclear testing in the Pacific, including lingering questions about the American commitment to address that legacy, an undertaking made more difficult by pervasive fraud and mismanagement in the region.“It’s a disaster,” said Tommy Jibok, a former member of the Bikini council who challenged Mr. Jibas in an election in 2019. “They told us we would be sitting and sleeping on money. Look what is happening now. We’re sleeping on nothing.”AdvertisementSKIP ADVERTISEMENTIn 1946, the United States relocated the 167 inhabitants of Bikini to clear the way for nuclear tests that it said would “end all world wars.” It then left them virtually alone on a small, desolate island, where many nearly starved. In 1948, the islanders were moved again.Over 12 years, the United States tested 23 nuclear bombs in Bikini. In 1968, President Lyndon B. Johnson announced that the Bikinians would return home. But after scientists found that radiation levels remained dangerously high, the United States in 1978 evacuated the almost 150 people who had chosen to go back. The Marshall Islands gained independence from the United States the next year.In 1982, the American government established a $25 million resettlement fund to clean up Bikini and support its people. In 1987, it created a second fund to provide annual payments directly to Bikinians. A year later, it contributed an additional $90 million to the resettlement fund. American officials administered the money and could veto withdrawals.Bikini representatives argued that the resettlement fund contained too little money to remedy the atoll’s radioactivity. They used the funds instead to support the exiled Bikinians.Editors’ PicksWhy You Can’t Stop Reading About Sofia Vergara’s SplitWould You Drink Wastewater? What if It Was Beer?Does My Fiancé Love Me, or Does He Just Want U.S. Citizenship?AdvertisementSKIP ADVERTISEMENTImageMike Pompeo, then the secretary of state, visiting in the Marshall Islands in 2019. With him is Hilda Heine, the Marshallese president from 2016 to 2020.Credit...Jonathan Ernst/Agence France-Presse — Getty ImagesBut the Bikini leaders were frustrated by American officials’ refusal to release more than a few million dollars each year. The struggle culminated in 2016 with the election of Mr. Jibas, who promised to take control of the resettlement fund. (The other fund is overseen by independent trustees.)AdvertisementSKIP ADVERTISEMENTDuring a 2017 congressional hearing, Mr. Jibas explained that Bikinians “​​know far better than the intermediaries or distant agencies of the United States what is needed to make the lives of the displaced population more bearable.”Douglas Domenech, at the time an assistant interior secretary, announced that the Interior Department would relinquish control of the resettlement fund to “restore trust and ensure that sovereignty means something.”Mr. Jibok, the former Bikini council member, had a different interpretation: that U.S. officials wanted to “wash their hands clean” of responsibility for Bikinians.Whatever the motivation, the result was a rapid increase in council spending under Mr. Jibas, from $7.6 million in 2016 to $25.7 million in 2018, according to audits from the time. Bank statements provided by Gordon Benjamin, a lawyer for the council, show that the fund, worth $59 million in 2017, was down to just $100,041 in March of this year.AdvertisementSKIP ADVERTISEMENTMany of the council’s purchases were popular, including of a small aircraft and two cargo ships to help supply isolated Bikinians, as well as construction equipment to build protections against rising seas that threaten low-lying Pacific islands because of climate change.But there were also more dubious purchases: $4.8 million for 283 acres of land in Hawaii; $1.3 million for an apartment complex in the Marshall Islands’ capital, Majuro; and multiple new vehicles for the personal use of Bikini council members, according to Mr. Benjamin. Mr. Jibas also introduced an annual $100,000 “representation package” to fund his regular trips to the United States.ImageIsles that form part of Majuro, the Marshall Islands’ capital. One of the purchases made with the resettlement fund was an apartment complex in Majuro.Credit...Josh Haner/The New York TimesMr. Jibas has said he wants to develop housing in Hawaii for rent or sale, but no development has taken place yet. The Majuro apartment complex was purchased as an investment property, but it appears to be losing money so far.Lani Kramer, a Bikinian who previously worked as the council’s city manager and is now challenging Mr. Jibas for the mayoralty, said Mr. Jibas and council members had used public funds for personal spending. “They were bringing receipts for diapers, chewing gum,” Ms. Kramer said. “It was obviously not for the people, it was for their own grocery shopping.”AdvertisementSKIP ADVERTISEMENTThe Marshall Islands’ banking commissioner has also accused Mr. Jibas of accepting $50,000 from a local bank manager who is being prosecuted on suspicion of unlawfully investing Bikini funds and laundering money. The Marshallese auditor general did not respond to requests for comment about the allegations.Starting in 2018, Mr. Jibas refused to disclose council finances to the Marshall Islands’ auditor general, prompting the police to seize council documents in 2021. Late last month, a spokesman for the Interior Department said it had written to bank officials seeking information about the fund and to Mr. Jibas requesting the council’s recent budgets.That request came after Jack Niedenthal, an American expatriate who served as the Marshallese health secretary, wrote to the Interior Department warning about the depleted trust fund and asking the department to intervene. He was subsequently fired for breaching diplomatic protocol by circumventing the Marshallese foreign ministry and the American Embassy.Mr. Jibas acknowledged in an interview that he occasionally used his representation package to buy food and other items for his family, which he said council staff members were aware of and had approved, but he denied taking money from the bank manager.ImageCollecting laundry on Ejit, an isle in Majuro. The money from the resettlement fund is nearly gone, and the Bikini community is in crisis.Credit...Josh Haner/The New York TimesAdvertisementSKIP ADVERTISEMENTMr. Jibas said in the interview that he was trying to access the independently controlled second fund, which now holds $28 million, to sustain council spending.According to Mr. Benjamin, starting in October 2021 the trustees of that fund permitted the council to withdraw roughly $13 million to fund its spending, but reversed their stance earlier this year and halted all payments out of the fund, including the regular living payments to Bikinians, to avoid further depletion. In the interview, Mr. Jibas said he also hoped to tap into new American funding to replenish the main fund.Earlier this year, the Biden administration promised to provide the Marshall Islands $700 million in one-time aid and to continue underwriting much of the government’s budget. Under a treaty, the United States controls the country’s defense policy, which the American government considers crucial to countering China in the region. The aid has not yet been approved, meaning Bikinians’ future remains uncertain.In a statement on behalf of Mr. Jibas, Mr. Benjamin said that the mayor’s critics were not pushing the United States hard enough for more funding.Mr. Jibok, who as a council member opposed Mr. Jibas’s efforts to gain control of the fund, said that the United States had done little to facilitate self-sufficiency in the Bikini community, leaving few financial safeguards in place.“I didn’t think we were ready,” Mr. Jibok said, “because I knew that we didn’t have anything in place to control” mismanagement or fraud.A version of this article appears in print on May 4, 2023, Section A, Page 4 of the New York edition with the headline: Bikini Atoll Leaders Blew Through Millions From U.S.. Order Reprints | Today’s Paper | Subscribe
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Centrioles are small cylindrical structures with roles in cell division, motility, and signaling. Typically, centrioles are highly stable structures which can persist for many cell generations. However, in some cells, such as the female germ line of many species, centrioles are programmed for elimination. This process is essential for maintaining centriole number from one generation to the next in sexually reproducing organisms, yet in nearly all species the molecular mechanisms underlying how centrioles are eliminated is unknown. The current study utilizes the nematode C. elegans to explore how centriole architecture changes during the elimination program in the female germ line. Using a suite of light microscopy techniques, the authors provide a stunning visual perspective of how centrioles are disassembled during oogenesis and show that removal of the central tube component SAS-1, a key regulator of centriole stability, is an early event in elimination. I have no major objections to the work and enthusiastically endorse its publication with the following minor revisions.

      Page 9 line 200: In the pcmd-1 mutant, the authors state that centriolar foci devoid of nuclei are present in rachis, but they do not mention in the text that there are also nuclei that lack centriole foci in early pachytene. This is mentioned in the figure legend, but I felt it was important enough to mention in the text.

      As per the reviewer’s suggestion, we will provide this information in the main text as well.

      Page 9 line 211. The authors found that in the absence of dynein heavy or light chain that centrioles remain associated with the nuclear envelope (rather than moving to the periphery). To me this was striking as dynein depletion in the embryo results in the opposite phenotype with centrioles losing attachment to the nuclear envelope and moving to the cell periphery (Gonczy et al. 1999 JCB 147:135). It might be worth pointing this out somewhere in the manuscript and speculating about the reasons for this difference.

      We will expand the Discussion section to better explain the difference of dynein’s involvement in the oocyte versus the embryo.

      Page 11 line 277: The authors state that elimination timing is not affected by the loss of SPD-5. This is a small but important point. It really is the absence of PCMD-1 and not SPD-5, as SPD-5 is still present in the cell. An alternative would be to say "in the absence of PCM" or "in absence of a pericentriolar accumulation of SAS-5".

      Fully agreed, we will modify the text accordingly.

      Figure 4D: Why does loss of PCMD-1 result in a delay in oocyte maturation as judged by RME-2 accumulation? This is not mentioned in the paper. Is this a general response to a loss of PCM or is this specific to a loss of PCMD-1?

      We realize that we were not sufficiently clear in explaining that RME-2 accumulation reflects the maturation state of oocytes. In the revised manuscript, we will clarify this point further and mention that a mild developmental delay (such as in pcmd-1(t3421ts) mutant animals) can impact the number of maturing oocytes present in the proximal gonad, and thereby lead to a slight shift in RME‑2::GFP distribution. See also related minor comment 2 of reviewer 2, and major comment 1 of reviewer 3.

      Figure 7 E and F. The authors measure the tubulin and SAS-4 intensity in wild-type and sas-1(t1521) embryos and conclude that microtubules and SAS-4 signals decay faster in the sas-1 mutant than in the control. To me, this is convinceingly the case with microtubules in panel E but I am not so sure this is the case with SAS-4 as shown in panel F. The differences in SAS-4 levels are much smaller between mutant and control. Could the authors provide statistical analysis to show how significant the differences are?

      We will provide the requested statistical analysis (which indeed shows significance).

      Page 15 line 363. I think this sentence should be reworded to: "Finally, we demonstrate that the central tube protein SAS-1 is the first of the factors analyzed here to leave centrioles..."

      In response to this suggestion and to the related comment of reviewer 2 (see below), we will rephrase this sentence to read “among the centriolar components analyzed to date, SAS-1 is the first to depart”.

      Reviewer #1 (Significance (Required)):

      The work contained in this manuscript represents a fundemental step forward in understanding the process of centriole elimination. The authors have carefully described the stepwise disassembly of the centriole including changes in the architechure during oogenesis. They have identified loss of the centriole stability factor SAS-1, as an early event in the elimination program and have found that in a sas-1 mutant, the centriole disassembles prematurely. They have also shown that loss of SAS-1 is followed by expansion of the centriole and ultimately loss of structural integrity. This work should be of interest to a broad range of scientists including those interested in centrosome dynamics, germ line development, and more generally cell biologists.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript Pierron et al. explore the mechanisms of centriole elimination during oogenesis in C. elegans. Centriole elimination is a common feature of oogenesis in many species, but it is relatively poorly understood and understudied. Here, the authors characterise the kinetics with which several key centriole and centrosome proteins are lost during this process in living worms, and they correlate this with an EM and expansion microscopy (U-Ex-STED) analyses of fixed tissues. They conclude that centriole elimination begins with the loss of SAS-1 from the central region of the centrioles, which correlates with the widening of the structure and the loss of the centriole MTs. A remnant structure containing several core centriole proteins remains, however, and this often ultimately detaches from the nuclear envelope and moves towards the plasma membrane in a MT-motor-dependent fashion before it dissipates (although detachment from the nucleus does not seem to be required for the eventual elimination of this residual structure). Intriguingly, centriole loss in this system does not appear to require the down-regulation of PLK activity, which is in contrast to the situation in Drosophila oogenesis.

      The manuscript is generally well written and the data is of a high quality and is logically and clearly presented. Although the ultimate mechanisms regulating centriole elimination remain obscure (i.e. what triggers the loss of SAS-1, and how is this regulated?), the data presented here will be of significant interest to the centriole/centrosome field and I am supportive of publication. I have a few points that the authors should consider prior to publication.

      Major comment:

      In the EM shown in Figure 5F the authors claim that the central tube of the centriole is disrupted, but the other elements (inner tube, MTs and paddlewheel) are not. I don't think this is as clear cut as the authors claim-at least from comparing the images of the one normal centriole (5E) and one centriole that is starting to be eliminated (5F). It seems much harder to distinguish the MTs and the inner tube in the image in 5F. Perhaps this is obvious to the authors as they have compared many more images, but I think they need to find some way of showing this more convincingly (a montage of multiple centrioles)?

      We understand that Figure 5F alone may have left the reviewer wondering whether the central tube is truly the first element to be disrupted during centriole elimination. We plan on strengthening this point by providing additional EM images as a Supplemental Figure.

      This same issue is compounded in Figure 6D where, using a different technique (U-Ex-STED), the authors claim that the centriolar distribution of SAS-1 is gradually disrupted as centriole elimination proceeds. It does look like the amount of SAS-1 has decreased from early prophase to late pachytene, but the central tube it stains doesn't look particularly disrupted and, if anything, the MTs look more disrupted (and also possibly of lower intensity, perhaps explaining why the ratio of SAS-1/tubulin doesn't change very much over these stages, as shown in Figure 6G).

      As the reviewer correctly noticed, there is some variability in central tube removal during oogenesis. In some cases, such as in the centriole on the right of the late pachytene panel in Fig. 6D, SAS-1 signal intensity diminishes uniformly, without apparent holes in the central tube. By contrast, in other cases, such as in the centriole on the left of the late pachytene panel, SAS-1 signal intensity diminution is accompanied by a loss of central tube continuity. We will clarify the writing and qualify our findings on this important point in the revised manuscript.

      These points are important, as throughout the manuscript the authors assume it as a fact that SAS-1 leaves the centriole early (which is clear), and that this leads to the specific loss of the central tube (which, at least on the basis of this data, is not so clear).

      As mentioned above, we will make certain that the results linking SAS-1 departure and central tube loss are explained in a clear and balanced manner in the revised manuscript.

      Minor comments:

      1. The authors state that the kinetics of GFP-SAS-7 or SAS-4 loss were not altered in pcmd-1 mutants (Figure 4A-C; Figure S3E,F). This doesn't look correct to me, as both proteins seem to stay brighter for longer in the mutant embryos (and this is quite easy to see on the quantification graph for SAS-7 in Figure 4C). It looks similar for SAS-4 from the pictures shown in Figure S3E,F, although this data is not quantified (and is there any reason why this data is not quantified?).

      As mentioned in response to reviewers 1 and 3, we will mention in the revised manuscript that a mild developmental delay can impact the number of maturing oocytes present in the proximal gonad, thereby leading to this slight shift in GFP::SAS-7 and GFP::SAS-4 persistence.

      1. The authors state that they demonstrate that SAS-1 is the first component to leave the disassembling centrioles. I would rephrase as they can't know this for sure (i.e. there could be some untested component that leaves earlier).

      In response to this suggestion and to the related comment of reviewer 1 (see above), we will rephrase this sentence to read “among the centriolar components analyzed to date, SAS-1 is the first to depart”.

      In the latter part of the Discussion the authors state that SAS-1 is critical for centriole elimination. I would rephrase, as this seems to suggest it is required for centriole elimination, which is not the case. It might also be worth discussing that the elimination machinery clearly seems to target SAS-1 early on, but we don't yet know what this machinery is or how it is regulated.

      We thank the reviewer for raising this important point, which we will implement in the Discussion accordingly.

      Reviewer #2 (Significance (Required)):

      The manuscript is generally well written and the data is of a high quality and is logically and clearly presented. Although the ultimate mechanisms regulating centriole elimination remain obscure (i.e. what triggers the loss of SAS-1, and how is this regulated?), the data presented here will be of significant interest to the centriole/centrosome field and I am supportive of publication. I have a few points that the authors should consider prior to publication.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Pierron et al. uses C. elegans oocytes to tackle a fundamental, yet heavily under-studied question in developmental biology: how are centrioles are eliminated during gamete formation/maturation? The paper's main conclusion is that SAS-1 (a key protein that make up the central tube in C. elegans centrioles) plays a critical part to regulate the timing of centriole elimination. I congratulate the authors on all the experiments related to SAS-1 part of their story, as they are done meticulously and in unprecedented detail (particularly all the fascinating EM and expansion microscopy data!).

      The paper also concludes that the Polo-like kinase family does not have a central role in this process, in stark contrast to a previous report demonstrating their importance for centriole elimination in Drosophila oogenesis (Pimenta-Marques et al. 2016 Science). Unfortunately, I am less convinced about this part of the paper, and half of my major comments below relate to the experiments/analyses in this regard. I was similarly not very enthusiastic about a part of story that I didn't find very relevant to the main point of the paper: half of the centrioles detach from the nucleus and translocate to plasma membrane prior to their elimination. I find the observations here quite epiphenomenal and lacking a direct/mechanistic relevance to either the PLK or SAS-1 part of the story. In my view, the authors should consider taking this part out.

      Regarding this last suggestion: we think that even if the movement of centrioles remnant is not essential for final removal, an account of this process provides important information about cellular dynamics during oocyte maturation. We note also that the two other reviewers did not raise this point, but leave the final decision to the editor.

      Overall, the piece is well written and organized, however it suffers from several shortcomings that preclude it from publication in its current form. I list my criticisms and suggestions below.

      Major comments:

      1. The authors state firmly at several places in the text that PCM components do not contribute to the timing of centriole elimination (e.g., lines 420-421), particularly given their experiments with Polo kinase paralogs. In my view, the data speaks otherwise. The centriole elimination process appears strikingly premature when SPD5__1__ (another PCM component) is overexpressed with the fluorescent transgene (Figure 1I). The opposite is also true - when another PCM component, PCMD-1, is knockdown by a temperature sensitive allele, the centriole elimination process is severely delayed 2 (Figure 4C). Even more extremely in the epistatic Polo mutant conditions (Fig. S3B), the centrioles do not appear to be eliminated at all__3__ (though the authors prefer to interpret this result differently in line 260-263, which could be flawed per my second comment below). How do the authors explain all these intriguing results? (underlining and numbering added above to clarify our responses point by point hereafter)

      1 > We respectfully disagree, since our quantifications show clearly that the SAS-7 signal disappears with an analogous timing in the line expressing RFP::SPD-5 (Fig. 1J) when compared to the other lines (Fig. 1D, 1F and 1H). The image shown currently for RFP::SPD-5 (Fig. 1I) is somewhat of an outlier compared to the others (Fig. 1C, 1E and 1G), and we will therefore provide a more representative specimen in the revised manuscript to avoid confusion.

      2 > As mentioned also in response to reviewers 1 and 2, we realize that we were not sufficiently clear in explaining that RME-2 accumulation reflects the maturation state of oocytes. In the revised manuscript, we will clarify this point and mention that a mild developmental delay (such as in pcmd-1(t3421ts) mutant animals) can impact the number of maturing oocytes present in the proximal gonad, and thereby lead to a slight shift in RME‑2::GFP distribution (as opposed to representing a delay in centriole elimination in pcmd-1(t3421ts) mutant animals).

      3 > We used plk-1(or683ts); plk-2(ok1936) double mutants to further test whether there might be premature elimination in this strong reduction-of-function condition compared to RNAi-mediated depletion. Although centriolar foci appear to remain for a longer time, these gonads are extremely disorganized, so that our conclusion regarding PLK-1 and PLK-2 are based primarily on the combined data shown in Fig. 3 and Fig. S3, which do not exhibit premature centriole elimination. We will rectify the writing to clarify these points.

      Also, I believe these claims (on the PCM components and their role in centriole elimination) will benefit from more nuanced statements. For instance, although Plk paralogs may not be necessary for the centriole elimination process, some other centrosome components clearly are. Paradoxically, the effects observed here (when disrupting or promoting PCM formation) has the totally opposite effects observed in Pimenta-Marques et al. 2016 Science. The 2016 piece claimed that the loss of PCM renders centrioles more vulnerable to losing their stability (which makes sense). How do the authors interpret their own results (i.e. that a disturbed PCM leads to slower centriole elimination, and vice versa)?

      As suggested by the reviewer, we will consider toning down claims regarding the role of PCM components in centriole elimination. Moreover, we will expand the section in the Discussion comparing our results with the published work of Pimenta-Marques et al. in Drosophila. This being written, as mentioned above, our findings do not suggest that removing the PCM (in pcmd-1(t3421ts) mutant animals) alters centriole elimination timing in C. elegans.

      I invite the authors to more carefully tread these nuances throughout their manuscript, which otherwise may cast major doubt on their claims.

      See point above.

      1. When investigating the role of Polo-like kinases, the authors assume that centriole elimination must follow (or correlate with) the dynamics of RME-2 (as a proxy for oocyte maturation). What guarantees that the centriole elimination process has to follow oocyte maturation? As far as I could tell, there is no direct evidence presented in the paper about this point. Do the authors have direct data (or reference to another work) that this trend must hold true at all times? I can readily see several places in the paper where this correlation doesn't appear to hold (e.g., in Fig. 4D the centriole elimination precedes the oocyte maturation under pcmd-1 condition).

        We will provide further data supporting the view that oocyte maturation and centriole elimination are correlated, whereby premature oocyte maturation mutants, such as let-60(ga89ts) and kin-18(ok395), exhibit precocious elimination.

      To correctly interpret their results on the epistatic Polo mutants, the authors could examine centriole elimination timing with mutants that can pre-maturely trigger or delay oocyte maturation (and do so without affecting the centriole biology itself).

      See above point.

      1. Lines 155-159 on the dimness of the SAS-6 signal make me worried about how successfully the transgenes were generated. Could the authors comment on, or perhaps extend in detail in the Methods section, through what assays the transgenes were validated? For example, did the authors try to rescue a SAS-6-/- with a SAS-6::GFP transgene? I would like to see further support for their validities.

      We will explicitly explain in the Material and Methods section that the SAS-6::GFP transgene indeed rescues the sas-6 null phenotype.

      If the authors can demonstrate the validity of their transgenes more reliably, could they possibly comment on the bunch of seemingly random SAS-6::GFP foci in Fig. 1G?

      We will comment on the presence of small SAS-6::GFP foci in the most mature oocytes, which correspond to potential precursors of centriolar elements later assembled in the embryo.

      1. Starting from line 204, the authors use the percentage of oocytes with detached centrioles (from the nucleus) as a proxy for movement to plasma membrane. This can be very confounding in my view (due to erroneous detachments etc.). As the authors explicitly state that the detachment is a process followed by a directed movement (with a defined velocity) towards the plasma membrane, this calls for a much better measurement in general. The authors should directly measure how far the centrioles are from the closest plasma membrane region in each condition they are examining (and should do this as a function of the "time progression" in different oocytes as they get closer to fertilization).

      As mentioned above, we think that an account of the movement of centriole remnants provides important information about cellular dynamics during oocyte maturation. However, given that this movement is not essential for the elimination of such remnants, it appears that providing additional complex 3D analysis as suggested by the reviewer will not benefit the present manuscript.

      Do the authors observe any propensity in sas1(t1521ts) oocytes as to where the centrioles are being degraded more prominently in the cytoplasm (i.e., when attached to the nucleus vs. when near the plasma membrane)? They could perform analyses à la their assessments in Fig. S2 and see whether they can extract some more information about this. In other words, I am wondering whether SAS-1 regulates the centriole elimination process more prominently at near the nucleus or near the plasma membrane.

      Centriole elimination occurs during pachytene in sas-1(t1521) mutant animals, when nuclei are packed in the gonad and surrounded by little cytoplasm. Therefore, even if foci were to detach from nuclei at this stage, we would not be able to quantify it with certainty. We will discuss these points in the revised manuscript.

      I ask this because the section about "centrioles moving to plasma membrane" appears epiphenomenal and rather random (i.e., the chances of a centriole moving to plasma membrane appears 50-50 under some control conditions - see control RNAi in Fig. 2G for example). Could the authors explore their existing data more closely (like suggested above), to see whether they could find intriguing correlations that tells us a little more about whether the centriole elimination at these two places are achieved differently? Otherwise, I frankly do not think this section contributes significantly to the essence of the story.

      We apologize for the confusion our writing seems to have generated. The chances of moving to the plasma membrane are not 50-50. The actual figure is 78.7% (reported as ~80% in the manuscript, line 187), and stems from the live imaging experiments where every travelling event can be monitored. By contrast, the analysis of fixed specimens is an underestimate as it provides only a snapshot of a dynamic process. We will expand the writing in the revised manuscript to clarify this point.

      Finally, the statements about a deterministic function for the plasma membrane re-localization should be toned down, because unlike what the authors claim in the paper (that ~80% of the centrioles move to plasma membrane), the control data (in Fig. 2B) clearly demonstrates that this number is more like ~60% (hence close to its chances being 50-50).

      Please see response just above.

      The paper carefully quantifies most of the data (for which I sincerely congratulate the authors!), however the experiments in Fig. S3 fall short of this. It would be nice if the authors could do the same here for completion.

      We will provide quantifications for Fig. S3E and S3F. However, due to the high disorganization of plk-1(or683ts); plk-2(ok1936) gonads, the presence of centriolar foci relative to oocyte position cannot be quantified accurately in this case.

      Minor comments:

      1. Sentence in lines 110-113 is too long and perturbs the flow. This should be shortened or be broken into better clauses. Perhaps the following way? "Prior analysis of centriole elimination in C. elegans oogenesis uncovered that this process takes place during diplotene..."

      The text will be modified accordingly.

      What are the orange arrowheads in the figure panels? They are not stated explicitly in the figure legends. My prediction was that they point to regions where centrioles are in another plane (though the overview is depicted from a different slice in the stack). Is this right? Either way, it will be useful to over-guide the reader on these orange arrowheads.

      The meaning of the orange arrowheads is explained in lines 520-521.

      If I am not wrong, the data/graph in Figures S2G and 2E are essentially the same (i.e., the data are duplicated). I couldn't find any statement in the figure legends indicating this. This should be added.

      Apologies about this oversight -the reviewer is correct and we will make a mention of this redundancy in the legend of Fig. S2.

      Some may consider the discussion on C2CD3 a little far-fetched, as this protein localizes to the distal end of centrioles (completely unlike SAS-1). Also, unlike the C. elegans centrioles, mammal centrioles do not contain a discernible central tube, casting doubt on the possibility of speculations made in the Discussion section. I suggest to remove out this paragraph, and instead to explicitly state whether the SAS-1 dependent mechanism could be applicable to other species is unclear.

      We will nuance these thoughts, further stressing their speculative nature, but intend to maintain them in some form as they provide a potential parallel that will be of interest to the human cell biology community.

      Could the authors add in their Discussion section some comment/thought on what the remaining GFP::SAS-7 pool (line 300-302) might possibly be? Curiously, there doesn't seem to be any structure associated with it in their EM tomograms, so it would be helpful to guide the reader further on this interesting finding.

      Although we would love to comment on this further, the remaining GFP::SAS-7 foci lack ultrastructural organization and do not exhibit recognizable electron densities. That this is the case will be stated explicitly in the revised manuscript.

      Reviewer #3 (Significance (Required)):

      General Assessment: This paper's strength is in its rigorous cell biology approaches to tackle a fundamental developmental biology problem. However, some of their conclusions are too firm while not being well-supported by the data, so the paper requires major revision before its publication.

      Advance: Discovery of a new molecular player in the centriole elimination process in worm oocytes, which can pave the way for future discoveries of centriole elimination mechanisms in other species. It is not yet clear whether the results will be broadly applicable, as some of the findings presented are in stark contrast to previous studies published on centriole elimination processes in Drosophila oocytes (e.g., Pimenta-Marques et al. 2016 Science). However, as summarized in the above section, these conclusions require further experimental evidence/support.

      Audience: Centriole elimination mechanisms are not widely studied, so I am not entirely sure whether this piece will be of immediate interest to the broad cell biology community. It will certainly be of general interest to several groups studying centriole elimination mechanisms, as well as developmental biologists trying to understand the oocyte maturation process.

      My expertise: Molecular and cellular mechanisms of cytoplasmic organization in development

    1. There’s tremendous value in coming into yourself as a person. Why wouldn’t that be true online, too? Recognizing that my online self was lacking, I decided to learn how to be myself on the internet.

      It is impossible to present yourself truly on the internet, to come into yourself as a person, when everything is highly self conscious and selective, as well as limited and misleading. In person we struggle to understand eachother. This may be because of the internet so I have no frame of reference, but how is the internet any better? Maybe because your inner dreams and thoughts can be shared alongside pictures of you - I am realizing what I know of internet representation of people is basically instagram and snapchat so I can't imagine a different reality. To accurately represent oneself you must be honest, a quality we are all incapable of to an extent, and I think the internet and its way of falsely representing things might create so much insecurity that this only pushes us further from honesty. You can't hide nearly as much when you are in front of people.

    1. Reviewer #1 (Public Review):

      Summary of what the authors were trying to achieve.

      This paper studies the possible effects of tACS on the detection of silence gaps in an FM-modulated noise stimulus. Both FM modulation of the sound and the tACS are at 2Hz, and the phase of the two is varied to determine possible interactions between the auditory and electric stimulation. Additionally, two different electrode montages are used to determine if variation in electric field distribution across the brain may be related to the effects of tACS on behavioral performance in individual subjects.

      Major strengths and weaknesses of the methods and results.

      The study appears to be well-powered to detect modulation of behavioral performance with N=42 subjects. There is a clear and reproducible modulation of behavioral effects with the phase of the FM sound modulation. The study was also well designed, combining fMRI, current flow modeling, montage optimization targeting, and behavioral analysis. A particular merit of this study is to have repeated the sessions for most subjects in order to test repeat-reliability, which is so often missing in human experiments. The results and methods are generally well-described and well-conceived. The portion of the analysis related to behavior alone is excellent. The analysis of the tACS results is also generally well described, candidly highlighting how variable results are across subjects and sessions. The figures are all of high quality and clear. One weakness of the experimental design is that no effort was made to control for sensation effects. tACS at 2Hz causes prominent skin sensations which could have interacted with auditory perception and thus, detection performance.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      Unfortunately, the main effects described for tACS are encumbered by a lack of clarity in the analysis. It does appear that the tACS effects reported here could be an artifact of the analysis approach. Without further clarification, the main findings on the tACS effects may not be supported by the data.

      Likely impact of the work on the field, and the utility of the methods and data to the community.

      The central claim is that tACS modulates behavioral detection performance across the 0.5s cycle of stimulation. However, neither the phase nor the strength of this effect reproduces across subjects or sessions. Some of these individual variations may be explainable by individual current distribution. If these results hold, they could be of interest to investigators in the tACS field.

      The additional context you think would help readers interpret or understand the significance of the work.

      The following are more detailed comments on specific sections of the paper, including details on the concerns with the statistical analysis of the tACS effects.

      The introduction is well-balanced, discussing the promise and limitations of previous results with tACS. The objectives are well-defined.

      The analysis surrounding behavioral performance and its dependence on the phase of the FM modulation (Figure 3) is masterfully executed and explained. It appears that it reproduces previous studies and points to a very robust behavioral task that may be of use in other studies.

      There is a definition of tACS(+) vs tACS(-) based on the relative phase of tACS that may be problematic for the subsequent analysis of Figures 4 and 5. It seems that phase 0 is adjusted to each subject/session. For argument's sake, let's assume the curves in Fig. 3E are random fluctuations. Then aligning them to best-fitting cosine will trivially generate a FM-amplitude fluctuation with cosine shape as shown in Fig. 4a. Selecting the positive and negative phase of that will trivially be larger and smaller than a sham, respectively, as shown in Fig 4b. If this is correct, and the authors would like to keep this way of showing results, then one would need to demonstrate that this difference is larger than expected by chance. Perhaps one could randomize the 6 phase bins in each subject/session and execute the same process (fit a cosine to curves 3e, realign as in 4a, and summarize as in 4b). That will give a distribution under the Null, which may be used to determine if the contrast currently shown in 4b is indeed statistically significant.

      Results of Fig 5a and 5b seem consistent with the concern raised above about the results of Fig. 4. It appears we are looking at an artifact of the realignment procedure, on otherwise random noise. In fact, the drop in "tACS-amplitude" in Fig. 5c is entirely consistent with a random noise effect.

      To better understand what factors might be influencing inter-session variability in tACS effects, we estimated multiple linear models ..." this post hoc analysis does not seem to have been corrected for multiple comparisons of these "multiple linear models". It is not clear how many different things were tried. The fact that one of them has a p-value of 0.007 for some factors with amplitude-difference, but these factors did not play a role in the amplitude-phase, suggests again that we are not looking at a lawful behavior in these data.

      "So far, our results demonstrate that FM-stimulus driven behavioral modulation of gap detection (FM-amplitude) was significantly affected by the phase lag between the FM-stimulus and the tACS signal (Audio-tACS lag) ..." There appears to be nothing in the preceding section (Figures 4 and 5) to show that the modulation seen in 3e is not just noise. Maybe something can be said about 3b on an individual subject/session basis that makes these results statistically significant on their own. Maybe these modulations are strong and statistically significant, but just not reproducible across subjects and sessions?

      "Inter-individual variability in the simulated E-field predicts tACS effects" Authors here are attempting to predict a property of the subjects that was just shown to not be a reliable property of the subject. Authors are picking 9 possible features for this, testing 33 possible models with N=34 data points. With these circumstances, it is not hard to find something that correlates by chance. And some of the models tested had interaction terms, possibly further increasing the number of comparisons. The results reported in this section do not seem to be robust, unless all this was corrected for multiple comparisons, and it was not made clear?

      "Can we reduce inter-individual variability in tACS effects ..." This section seems even more speculative and with mixed results.

      Given the concerns with the statistical analysis above, there are concerns about the following statements in the summary of the Discussion:

      "2) does modulate the amplitude of the FM-stimulus induced behavioral modulation (FM-amplitude)"<br /> This seems to be based on Figure 4, which leaves one with significant concerns.

      "4) individual variability in tACS effect size was partially explained by two interactions: between the normal component of the E-field and the field focality, and between the normal component of the E-field and the distance between the peak of the electric field and the functional target ROIs."<br /> The complexity of this statement alone may be a good indication that this could be the result of false discovery due to multiple comparisons.

      For the same reasons as stated above, the following statements in the Abstract do not appear to have adequate support in the data:<br /> "We observed that tACS modulated the strength of behavioral entrainment to the FM sound in a phase-lag specific manner. ... Inter-individual variability of tACS effects was best explained by the strength of the inward electric field, depending on the field focality and proximity to the target brain region. Spatially optimizing the electrode montage reduced inter-individual variability compared to a standard montage group."<br /> In particular, the evidence in support of the last sentence is unclear. The only finding that seems related is that "the variance test was significant only for tACS(-) in session 2". This is a very narrow result to be able to make such a general statement in the Abstract. But perhaps this can be made more clear.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank reviewers for their insightful comments.

      Overall, there were two major concerns/suggestions:

      • Applicability to humans of the increase of BTC in non-alcoholic steatohepatitis (NASH) and mechanisms of downregulation of BTC by omega-3. We now analyzed __3 __additional human gene expression datasets and show that BTC not only is increased in human NASH (as we have already shown for liver cancer meta-analysis), but is also decreased in livers of patients who received omega-3.

      • One of the reviewers suggested investigating a potential mechanism of how BTC is regulated by omega3 fatty acids. Although a complete answer to this question would require entirely new studies to be done, we still performed additional investigation that was possible within a reasonable timeframe. We found that transcription factor FOXO3 (well-known inhibitor of carcinogenesis) is a highly probable mediator of the DHA inhibitory effect on BTC.

      See all details of items 1 and 2 as well as answers to other (less critical concerns) below after each specific question.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This work by Padiadpu and colleagues investigate the mechanism by which pufa of the n-3 series (mostly DHA) may influence NAFLD progression using systems biology analysis and multiple omics analysis. The work is interesting and may provide a novel view of the topic. However, there are a number of issues the authors may wish to consider in order to improve their manuscript.

      Major issues: Clarity: Since the authors refer to previously published experiments, they must refer to this work in the figure legends and improve the clarity of such legends. Here are a list of issues that must be fixed:

      Fig.1: First panel is not clear. What does the table tell the reader? What are the effects of the different diets on NAFLD?

      All the transcriptomic data are newly generated from the samples of previously published studies. The table shows the number of features changed by DHA and/or EPA in each of the -omics and phenotypic data used in the analysis.

      I understand that the results are published elsewhere, but the authors must provide information regarding the NAFLD/ NASH scores.

      We now added a supplementary table 1a showing the scores.

      Fig.4: Why is there sometimes a DHA diet, sometimes DHA and EPA. Legend is not clear. What does WD + Mean? I guess it is olive oil... But the legend must be improved.

      We added details in the legend for more clarity. Specifically, WD+O means WD + olive oil added as a control for WD+DHA, WD+EPA. As described in the 2nd paragraph of results, when both EPA and DHA had a similar and significant effects in reversing WD effect, it was defined as “EPA&DHA category” of parameters. When only WD+DHA or WD+EPA were significantly changed vs WD+O, those were assigned as “DHA category” or “EPA category”, respectively.

      One issue the authors may consider trying to fix is the specificity of the effect of DHA on BTC.

      Is it really specific? It seems to me that EPA has more or less the same effect. If the effect is DHA-specific, than make this clearer through the text.

      Although BTC expression was reduced by both DHA and EPA comparing to WD, DHA had a statistically significant stronger effect than EPA (Fig. 3D).

      Another issue the authors may wish to investigate is the relationship between W3 consumption and BTC expression in studies performed by other labs (if available on Gene expression omnibus?).

      Thanks for the suggestion. We used publicly available data of human and mouse studies that showed significant increase in liver BTC gene expression in NASH in multiple datasets while a human trial with Omega 3 treatment for one year showed its significant reduction (Figures 3F - human data, S3G-mouse data).

      Finally, a key issue would be to identify the mechanism by which DHA inhibits BTC expression? How does this happen? could such inhibition be induced by other fatty acids of the W3 series? I understand that this is not easy to address but it would significantly strengthen the manuscript.

      Thanks to your question we investigated and found at least one of potential mechanisms contributing to how “DHA inhibits BTC expression”. See details in the answer to next question. As for “other fatty acids” while we agree this is important question, it is outside of the scope of the current study but will be investigated in future studies.

      Moreover, it might be possible to identify the set of genes highly co-regulated with BTC expression and to investigate the possible transcription factors at play in the control of such gene set.

      We really appreciate this question as our efforts in this direction provided one potential mechanism. A direct screen of transcription factor (TF) motifs in genes co-regulated with BTC did not provide any clear results. Therefore, we implemented a combination of network analysis and screen for motifs in BTC gene with the in vivo and in vitro treatment results and found FOXO3 as a candidate TF regulated by DHA upstream of BTC.

      See details of the analysis and results in a new Supplementary Figure S6 and corresponding text located at the end of the results.

      Minor: the authors use the term "beneficial" transcriptome alterations by DHA.

      I do not think it is correct to use "beneficial".

      We agree and removed the word "beneficial”.

      Reviewer #1 (Significance (Required)):

      Strength: This paper uses new approaches to investigate the relationship between W3 consumption and liver gene expression and its relevance to chronic metabolic liver diseases.

      The experiments and data set used to perform systems biology are from an excellent lab (the authors lab) who has published a lot of important and reproducible discoveries in the field of regulation of gene expression by dietary fatty acids.

      The work has high translational relevance in medicine / hepatology / metabolism.

      I am not a qualified reviewer to assess the systems biology that has been done.

      Limitation: The mechanistic link between DHA consumption and BTC expression is not very clear. The specificity of this effect could also be tested (DHA vs other W3 and/or W6).

      Although BTC expression was reduced by both DHA and EPA comparing to WD, DHA had a significantly stronger effect than EPA (Fig. 3D). Other omega fatty acids were not tested but it can be done in future studies.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors files a manuscript describing the impact of the suppression of betacellulin as a key mechanism to counteract fibrosis and inflammation in NASH by modulating fatty acids in WD-fed mice.

      Major Comments: (i) No histological analysis was presented and indeed this is of clinical relevance for NASH since diagnosis is still based on biopsy.

      While histological evaluation was presented in the originally published papers (PMID: 28422962, 23303872), it is now provided in Supplementary Table S1a.

      (ii) Human comparative analysis: is done with HCC not with NASH patients.

      This cancer-related dataset is most likely obtained from different etiologies.

      I would suggest comparing these mouse datasets with GSE48452 (human NAFLD-NASH spectra).

      Thanks for this important question. We now analyzed available human data of NASH and show significant increase of BTC expression in two datasets while a human trial with omega-3 treatment for one year showed its significant reduction of BTC expression (Figure 3F) resembling our observations in mice.

      (iii) to compare the inflammation and fibrosis (also lipid metabolism), one can compare these mouse datasets with GSE222576 and cite this preprint (https://doi.org/10.21203/rs.3.rs-2009380/v1)

      Using the suggested dataset (of a chemically induced liver fibrosis), we first observed that Btc gene expression was significantly increased over 10 weeks of the model and now included this result in Fig. S3G.

      We also queried the 66 genes from the network modules described by the authors to check their changes in our NASH model. We observed that 28 genes were differentially expressed in NASH with 14 of them belonging to the module that authors named as “Pathways in Cancer”. Other genes were from the lipid metabolism (4 genes), immunity (2) and inflammation (2 genes). In addition, we observed that several genes we found regulated by omega-3 and changed in this fibrosis model contained other inflammatory genes such as classical macrophage genes (Mmp12, Lgals3, Cd68, Trem2), fibrosis (Col4a1, Col27a1, Itga2b, Itga8) and lipid metabolism (Scd2, Lpl, Soat1). Of note, the preprint has been published and we now cite the corresponding article.

      Minor comments:

      (i) The heatmap in Figure 1B and another heatmap should show all mice not the average to see the variability

      The supplementary figure with all the individual mouse data as another heatmap is added to show the variability and similarity (Figure S1D).

      Reviewer #2 (Significance (Required)): The authors files a manuscript describing the impact of the suppression of betacellulin as a key mechanism to counteract fibrosis and inflammation in NASH by modulating fatty acids.

      This is well designed experiment, and the results are of interest to hepatologists and should be indeed published after consideration of the following points

      Strength is multiOMICs approach.

      Weakness is human applicability.

      We improved human applicability by investigating 3 additional human datasets of NASH (Fig. 3F) and finding consistent changes in BTC expression closely resembling our observations in mouse NASH model, including one trial with omega-3 treatment of patients for one year showing significant reduction in BTC gene expression.

    1. Author Response

      Reviewer #1 (Public Review):

      This study demonstrates that a hybrid measurement method increases 3 fold the resolution of mouse USV localization. This increased resolution enables to revise previous occurrence frequency measures for female vocalizations and establishes the existence of vocal dominance in triadic interactions. The method is well described and its efficiency is carefully quantified. A limitation of the study is the absence of ground truth data, which may have been generated eventually with miniaturized loudspeakers in mouse puppets. However, a careful error estimation partially compensates for the absence of these likely challenging calibrations. In addition, the conclusions take into account this uncertainty. The gain in accuracy with respect to previous methods is clear and the impact of localisation accuracy on biological conclusions about vocalisation behavior is clearly exemplified. This study demonstrates the impact of the new method for understanding vocal interactions in the mouse model, which should be of tremendous interest for the growing community studying social interactions in mice.

      We have performed the requested, additional ground estimate using a movable miniature speaker, for more details see point 2 of Reviewer 2, and the new supplementary figure.

      Reviewer #2 (Public Review):

      Past systems for identifying and tracking rodent vocalizations have relied on triangulating positions using only a few high-quality ultrasonic microphones. There are also large arrays of less sensitive microphones, called acoustic cameras that don't capture the detail of the sounds, but do more accurately locate the sound in 3D space. Therefore the key innovation here is that the authors combine these two technologies by primarily using the acoustic camera to accurately find the emitter of each vocalization, and matching it to the highresolution audio and video recordings. They show that this strategy (HyVL) is more accurate than other methods for identifying vocalizing mice and also has greater spatial precision. They go on to use this setup to make some novel and interesting observations. The technology and the study are timely, important, and have the potential to be very useful. As machine learning approaches to behavior become more widespread in use, it is easy to imagine this being incorporated and lowering entry costs for more investigators to begin looking at rodent vocalizations. I have a few comments.

      1) What is the relationship of the current manuscript to this: https://www.biorxiv.org/content/10.1101/2021.10.22.464496v1 which has a number of very similar figures and presents a SLIM-only method that reportedly has lower precision than the current HyVL approach. Is this superseded by the submitted paper?

      The referred manuscript (now published in Scientific Reports) is indeed related to the current work: The currently presented system is based on the integration between SLIM (based on 4 high quality microphones) and Beamforming (based on the 64-channel microphone array). The accuracy of SLIM is generally lower than that of HyVL, but it makes essential contributions to the overall accuracy of HyVL through the integration of the complementary strengths of the two methods/microphone arrays (see Fig. 3A, L-shape of errors). To our knowledge, SLIM was the previously most accurate technique (based on 4 microphones, see comparison in the Discussion), but HyVL exceeds this by a substantial margin. Some figures appear similar mostly due to related code in the underlying analysis pipeline and visualization scripts (e.g. the half-disc densities). However, the set of dyadic and triadic recordings was collected specifically for the present study, and all top-level analyses were performed separately. The single mouse (C57Bl/6 WT) ground truth dataset is shared between the two studies, where in the SLIM paper only the USM4/SLIM part was evaluated (leading to a correspondingly lower, single animal accuracy).

      We felt that the level of detail above would probably impede the reading of the manuscript, and we have therefore added a subset of the above clarifications to the methods and the first time the other study is mentioned.

      2) Can the authors provide any data showing the accuracy of their system in localizing sounds emitted from speakers as a function of position and amplitude? I am imagining that it would be relatively easy to place multiple speakers around the arena as ground truth emitting devices to quantify the capabilities of the system.

      Ground truth data is critical for any meaningful comparison. First, we would like to highlight that we already provided ground truth data in the previous version of the manuscript: In Fig. 3C. we analyzed vocalization data from trials with (1) just a single mouse as well as (2) vocalization at times when all mice were far apart in relation to the accuracy of HyVL (>100 mm, i.e. >25x the accuracy of HyVL) where the chances of erroneous assignment are negligible. We think that these tests are the most relevant, as they are conducted with the relevant sounds, at their actual intensity, spectral profile and emitter acoustics.

      In addition, we have now conducted a series of tests with sounds produced by a miniature speaker placed in 25 different locations to demonstrate the lower-bound of accuracy achievable with the system. The tests indicate an accuracy of MAE < 1mm under these ideal conditions, i.e. without the absorption of the mouse bodies, varying direction of emission of the mouse snout, varying intensity, varying spectral content, duration, etc. Exploring the dependence on all these parameters is in itself interesting, but requires a detailed study in itself. The detailed experimental conditions and results are now provided in Supplementary Fig. 4, including a quantification of the dependence on amplitude.

      3) How is the system's performance affected by overlapping vocalizations? It might be useful to compare the accuracy of caller identification for periods where only one animal is calling at a time vs. periods where multiple animals are simultaneously calling.

      This is an excellent question. Our current code for detecting vocalizations cannot automatically determine if one or multiple vocalizations are concurrently present. We have therefore manually checked all vocalizations for overlapping instances, including those in triadic recordings with two males, where this would be expected to occur most frequently.

      We considered vocalizations to be overlapping if the overlapping constituent timefrequency traces did not form a harmonic stack. Overall, overlaps were surprisingly rare. We did find a couple of cases (<0.1%) where our detection algorithm produced a longer vocalization interval that contained multiple, differently shaped vocalization traces that, when re-analyzed in shortened time-frequency bins with beamforming, belonged to two different males. Note here that beamforming is separately performed from the onset to the end of each vocalization, so the cumulative heatmap can change depending on these onset and end times, which are normally determined by our detection algorithm.

      However, although the identity of the assigned vocalizer could shift in these very rare cases depending on which time bin was re-analyzed, the system’s localization performance remained in principle unaffected: as mentioned above, shorter time bins on non-overlapping parts correctly show the origin of the vocalizations in this case, and therefore a solution to this issue could be a USV detection algorithm that is able to detect the overlap based on the spectral shapes and parses them apart. During the beamforming each vocalization can then be separately localized, by restricting the beamforming to the corresponding time and frequency range. Further, the analysis could be refined so that multiple salient peaks can be detected in the soundfield estimate. This would, however, substantially change the analysis approach, i.e. rather than a single estimate per USV, a sequence of soundfield estimates should be computed and later fused again. Since such a procedure uses less data per single estimate, it also increases the possibility of false positives, which in the current situation with very few overlaps in time, would likely reduce the overall accuracy of the system, we decided to not modify the algorithm in this direction, but we agree that ideally a joint approach - combining separation on the spectrogram and soundfield level - should be pursued. For the present data, if a time window was analyzed such that the intensity map of the sound field contains multiple hotspots of an approximately equal magnitude, the USV would likely remain unassigned, because the within soundfield uncertainty would be higher than for a single peak, and this would reduce the MPI. However, given the rarity of these cases in our dataset, we do not think that their exclusion would change the results appreciably. This information was added as a paragraph to the Discussion.

      It is worth noting that HyVL is very robust: There were a number of cases (<5%) where environmental dampening in combination with harmonic stacking produced interesting timefrequency traces in some of the USM4 microphones, but our system did not have any issue spatially localizing this - what seems like a - smeared vocalization trace. We provide a few examples of this kind in a short video (see Rebuttal Video 2 and the legend at the bottom of this document), where the overlap is also reflected in the intensity map of the sound field, overlaid onto the platform.

      4) Can the authors comment on how sound shadows cast by animals standing between the caller and a USM4 affect either the accuracy of identification or the fidelity of the vocal recording?

      An important point to raise. Sound scattering and dampening caused by the conspecifics of the vocalizing animal can impede the accuracy of any sound localization system, but can unfortunately not be avoided in a social setting. To address this issue, we raised all USM4 microphones by ~12 cm above the interaction platform to minimize the instances of sound blocked by the mice. Further, the Cam64 device should largely be unaffected by sound shadows as it is centrally located above the platform. We have added a modified version of the above comment to the discussion under the heading "Current limitations and future improvements of the presented system".

      5) I'm a bit confused about how the algorithm uses the information from the video camera. Reading through the methods, it seems like they primarily calculate competing location estimates by the two types of microphone data and then make sure that a mouse is in close proximity to one location, discarding the call if there isn't. Why did the authors choose this procedure rather than use the tracked position of the snouts as constrained candidate locations and use the microphone data to arbitrate between them? Do they think that their tracking data are not reliable or accurate enough?

      Thanks for this important suggestion, which we have actually grappled with a lot during the analysis. First of all, the visual tracking data, in particular the manual data, is in our opinion (based on human visual identification) near perfect (within the limits of the video resolution, pixel resolution = 0.8 mm), i.e. on the order of 1-2 mm, and is therefore not the source of any unattributable vocalizations. If we understand the reviewer correctly, then we indeed perform the attribution as he indicates based on the tracked snouts of all mice, specifically by measuring the MPI's of both acoustic location estimates for all mice and then choosing the most reliable one. Specifically, the attributions can be grouped into 3 cases: (i) Estimated origin close to one snout, and snouts rather far apart, (ii) Estimated origin close to one snout and snouts close, and (iii) estimated origin not close to either snout. (i) is easy to address, (ii) is appropriately handled by the mouse probability index, but (iii) is tricky. Since the vocalization has to come from one of the mice, this already indicates that the localization is not working well in this case. Therefore we found it prudent (similar to Neunuebel et al. 2015) to not assign in these cases. Interestingly the MPI is not useful in these cases, as due to the exponential dependence of the normal density on distance, for example a case with a distance of 50 mm to one snout and 60 mm to another snout could lead to an MPI close to 1, which is likely not trustable. We have described this in the Methods as follows:

      "This distance threshold mainly serves to compensate for a deficiency of the 𝑀𝑃𝐼: if all mice are far from the estimate, all 𝑃𝑘 are extremely small, however, the 𝑀𝑃𝐼𝑘 will often exceed 0.95."<br /> Due to the inherent limit for localizing very quiet, short USVs by any system, we think this kind of selection (introduced originally by Neunuebel et al 2015) is a valuable and necessary step in the processing to avoid confusions (which are of course already substantially reduced through HyVL here).

      6) I guess the authors have code that we can run, but I couldn't access it. The manuscript describes the algorithms and equations that are used to calculate the location, but this doesn't really give me a feel for how it works. If you want to have the broadest impact possible, I think you would do well to make the code user-friendly (maybe it is, I don't know). In pursuit of that goal, I would suggest that the authors devote some of the paper to a guided example of how to use it.

      While the code was made available to the reviewers via the link at the beginning of the manuscript (p2, before abstract), we completely agree that this method of distribution is not very accessible. We have therefore created a publicly available GitHub repository (https://github.com/benglitz/HyVL) which hosts the code and details its use on the basis of a sample data set (which is available to the reviewers in the repository link, and later to the public under https://doi.org/10.34973/7kgc-ta72). While we do provide a sample video and analysis workflow there, our data analysis pipeline is quite integrated and other labs will likely use different pipelines. We have therefore tried to make the core functions independent of our pipeline and thus easy to integrate by others into their analysis pipelines.

      Reviewer #3 (Public Review):

      The present manuscript describes a new method to identify the emitter of ultrasonic vocalisations during social interactions between 2 or 3 mice. The method combines two technologies (an "acoustic camera" and a set of four microphones) and succeeds in increasing the spatial precision and the attribution of USV emission to one of the mice. The manuscript describes the characteristics and advantages of each method and the advantages of using both to optimize the identification of USV emitter. The authors used the method to confirm that females are also vocalising during male-female interactions and that females emit USV mostly during nose-nose contact while this was not the case for males. Interestingly, the authors identified that the vocal behaviour of two competing males was strongly asymmetric when facing a female. This was not the case for two females facing one male.

      The method is really promising since the identification of the emitter of USVs during mouse social interactions is a necessary step to speed up our understanding of this communication modality. The increase in spatial precision and in the proportion of attributed vocalisations is non-negligible and will be of great utility in the future.

      We would like to thank the reviewer for this positive perspective on the future utility of our system.

      Generally, the statistical analyses should be adjusted. Indeed, the statistical analyses do not consider the fact that the same individuals were recorded several times (if we understood well the methods). Each point was considered independent (in non-parametric Wilcoxon tests), while this is not the case given the repetitions with the same individuals (the number of repeated encounters per individual should be given in the methods section, by the way). We strongly recommend revising the statistical analyses of the results in Figures 4 and 5. In addition, it could be interesting to check whether the vocal behaviour is stable within each individual (i.e., a male that is vocalising frequently in one situation vocalises always frequently in other situations).

      We generally agree with this suggestion: In order to properly conduct the analysis for individuals as you suggest, a balanced dataset should be used. We had initially collected such a balanced dataset, which was previously not detailed in the manuscript, as the focus was on USV localization/attribution and hence only the recordings containing USVs were analyzed (detailed now in the beginning of Results and Methods). However, overall, the probability of a recording containing vocalizations at all is low: in our balanced set only 23/112 recordings contained vocalizations. We therefore had collected additional recordings with the best vocalizers which created the previously analyzed set of 83 recordings containing USVs recorded with all microphones. This dataset is therefore dominated by recordings from mice that are active vocalizers. While this does not raise any issue for the estimation of the accuracy of the method (Figure 3) or the female vocalizations (Figure 4, because recordings were always randomized across female mice), it precludes an encompassing analysis of individual differences in Figure 5, i.e. the dyadic-triadic comparison. In the new Figure 5, we address the reviewer's question for the dyadic recordings, finding that the current set of recordings does not provide sufficient evidence that individual male mice had significantly different vocalization rates. We would, however, like to point out that this is likely a consequence of the n=4 recordings that are compared here. For the female mice, we also did not find differences in vocalization rates, which is based on n=14 recordings and thus a more reliable result (p=0.16, 1-way ANOVA with factor individual).

      For the triadic recordings, however, due to a limitation in the experiment execution, we unfortunately do not have the complete information available on an experiment level for the triadic recordings, i.e. the video stream was accidentally started after all mice were placed in the platform, and since the same sex animals are visually not separable (while the female mice are separable from the males, based on a slightly shaved region on their head), we cannot completely assess this question in triadic recordings based on the available data. When including the triadic recordings in addition and assuming a single vocalizer (combining all male USVs, see below for why the males could not be assigned in the triadic condition) the male individual comparison can be approximately performed with n=8 recordings, and then the dependence on individual becomes borderline significant (p=0.028, 2-way ANOVA with factors individual and condition).

      For the comparison of vocalization rates in the previous Figure 5 that the reviewer was referring to, we cannot perform a rigorous analysis on the individual level, due to the lack of balance. While we thus agree that differences between individual mice can contribute to the differences observed, we do not think that this would change the conclusion that one of the mice dominates the vocal emissions. If the reviewers agree, we would thus leave Figures 6 (old Fig. 5) and new Figure 7 (behavioral confirmation of dominant/subordinate division) as part of the manuscript, with a clear cautioning about the possible contribution of individual differences to the observed differences. If the reviewers find it inappropriate to leave the results based on the unbalanced dataset in, all results after figure 5 could also be excluded (although we would find this unfortunate, given the additional time and effort we have invested in these).

      It is not easy to understand the rationale behind testing animals in pairs and in triads from the beginning of the manuscript. The authors should better introduce this aspect in the manuscript, especially given the fact that biological results deal with this aspect in Figure 5. The authors might strengthen the parts of the biological results extracted from their new method.

      Thank you for pointing out the need for clarification regarding the rationale behind testing animals in pairs and in triads. It is because courtship interactions are particularly vocal and social, that they are of interest to many fields, e.g. neurodevelopmental disorders.3,4 Due to the natural competitiveness between mice during courtship interactions, high accuracy is particularly beneficial in this regard because it allows disentangling USVs at close distances. We adapted the introduction to better reflect this reasoning and included an extra paragraph in the introduction and also where the biological results from old Fig. 5 / new Fig. 6 are summarized.

      More specifically, the fact that one male takes over the vocal behaviour within a triad is of high interest. Nevertheless, some behavioural data would be needed to strengthen these findings.

      We agree that this is an interesting finding and also agree that some additional behavioral analysis is useful to complement it. In order to arrive at this analysis, we performed all-frame, 3-animal tracking on the 14 triadic recordings with two males. This required switching to skeleton tracking with SLEAP5 in addition to manual post-processing to ensure that no identity switches occur. In each recording the dominant male was then defined as the one that emitted more vocalizations, and then the vocalization-independent spatial interaction histogram was computed, similar to the ones in Fig.4, but now separating between the dominant and the subordinate males (see new Figure 7). The results are consistent with the most typical location of vocalization of the male, in proximity to the female abdomen: The dominant male's spatial interaction histogram (Fig. 7A) was more clearly peaked in the location of the female abdomen very close to the male's snout, in comparison with the subordinate male's histogram (Fig. 7B), which shows up very clearly in the difference between the normalized histograms (Fig. 7C). Significance analysis was performed using 100x bootstrapping on the relative spatial positions to estimate p=0.99 confidence bounds around the histograms of the dominant and subordinate respectively. Significance at a level of p<0.01 highlights multiple relative spatial positions (Fig. 7D), including the one proximal to the snout which has the largest absolute difference (Fig. 7C). Note, that these analyses were conducted on the basis of the non-balanced dataset which contained enough vocalizations to assess the dominant male based on the vocalization rates and thus individual traits of certain animals remain as a possible confound.

      A small proportion of USVs was not assigned. The authors did not discuss the potential reason for this failure (Were the USVs too soft? Did they include specific acoustic characteristics that render them difficult to localise?). These points could be of interest when testing other mouse strains or other species.

      Good point, we agree that it is interesting to know the reasons for failure. As so often, there is not a single property that makes localization hard, but multiple factors contribute. In the SLIM paper, we already identified duration and intensity as important contributors (Fig. 3E/F), and in the speaker test (see new Supplementary Fig. 4) we again demonstrated the influence of intensity. In addition, frequency bandwidth and acoustic occlusion are two other main contributors that each influence the availability of the information/signal-to-noise ratio at the microphones:

      • Frequency bandwidth: In signals that are very narrowband, there are more opportunities for phase ambiguity, in particular for very high-frequency signals. These are avoided/reduced for more wideband signals.

      • Acoustic occlusion: As ultrasonic sounds can be quite directional, if an animal is vocalizing away from a microphone, which in addition would put its body in the way of the sounds to the microphone, then this can reduce the intensity at the microphone to a level where the information is insufficient to utilize information from this microphone. This mostly influences the 4 microphones surrounding the platform, while the Cam64 overhead will likely not be affected by acoustic occlusion in the plain.

      We have added a brief version of this explanation to the discussion under the heading: "Current limitations and future improvements of the presented system"

    1. Author Response

      Reviewer #1 (Public Review):

      Hoang, Tsutsumi and colleagues use 2-photon calcium imaging to study the activity of Purkinje cells during a Go/No-go task and related this activity to their location in Aldolase-C bands. Tensor component analysis revealed that a substantial part of the calcium responses can be linked to four functional components. The manuscript addresses an important question with an elegant technical approach and careful analysis. There are a few points that I think could be addressed to further improve the quality of the manuscript.

      1) The authors should be careful not to overstate the goal and results. For instance, in the abstract it is stated that dynamical functional organization is necessary for dimension reduction. However, the statement that the 4 TCs together account for about half of the variance (line 220) indicates that dimensionality may not be reduced that much. I would suggest revising the first and last sentence of the abstract accordingly.

      Dynamic functional organization of TC1 and TC2 by synchronization is the major finding of this study and we believe that it is one of the most efficient mechanisms of dimension reduction, given the unique anatomy of the cerebellum. In the revised manuscript, we added a supplemental result showing that the dimensionality of TC1 and TC2 neurons decreased and increased, respectively, in accordance with bi-directional changes in their synchronization (Figure 3 – figure supplement 1DE). Dimension reduction was further confirmed by conventional PCA (Figure 6 – figure supplement 1). However, we agree that the statement that the cerebellum reduces dimensions by self-organization of components is speculative, and we revised the abstract accordingly.

      At the end of the introduction, the authors refer to "the first evidence supporting the two major theories of cerebellar function" but which two theories is referred to and how this manuscript support them is not very obvious. Similarly, they state that "This study unveiled the secret of cerebellar functional architecture", which I would consider to be an unnecessary overstatement of the impact of the work described.

      In the revised Introduction, we explicitly stated that TC1 and TC2 are related to timing control and cognitive error learning, respectively, with some indirect causal evidence. We also revised the last paragraph of the Introduction to emphasize that this study provides the first evidence to support the view that distinct cerebellar components may serve divergent cerebellar functions in a single task. The statement "This study unveiled the secret of cerebellar functional architecture" was removed.

      In the title, the authors use the word modular. In the consensus paper on cerebellar modules (Apps et al., 2018) an attempt is made to unify the terms used to describe cerebellar anatomical structures. Here "module" is used for the longitudinal zone of interconnected PCs, CN neurons and olivary neurons. As the authors only studied PC activity (and indirectly the IO), I would suggest using band, stripe or subpopulation instead.

      Because we used TCA to identify functional components underlying the Go/No-go data, we changed the word “module” to “component” in the title.

      Finally, the term "CF firing" or "CF activity" is used when referring to the recorded signals. However, the authors measure postsynaptic calcium responses that are indeed likely driven by CF inputs, but could also be influenced by PF inputs. At the very least, because Purkinje cells and not climbing fibers are being imaged, "complex spike" should be used instead. It would be more accurate still to use the more general "calcium response" and make less of an assumption about the origin of the calcium response.

      In this study, CF-dependent dendritic Ca2+ signals in adjacent AldC compartments were recorded by the two-photon imaging. The HA_time algorithm (Hoang et al. 2020) was then applied to extract spike timings from the recorded signals. In the revised manuscript, we used the terms “calcium responses” and “complex spikes” when referring to the recorded Ca2+ signals and the estimated spikes, respectively.

      2) For some figure panels and statements in the manuscript error bars or confidence intervals and statistics are missing. This is the case for, for example, the changes in fraction correct, lick latency, fraction incorrect, etc. (Fig 1B, 2E-F, TC levels in 3, 4D-E and 5A-C). Including these is particularly relevant in Fig 4E as this is a key result, mentioned also in the abstract. Please indicate clearly if these plots are cumulative for all mice or per mouse and averaged. I advise the authors to statistically support the claim that the changes are significant and in opposite direction as this element of the study is referred to in the abstract and discussion (summary).

      We added the error bars / confidence intervals to the related figures. Most importantly, we added histograms of synchrony strength for TC1/TC2 neurons (Figure 4E) and conducted statistical tests to strengthen the claim of bi-directional changes in synchronization of TC1/TC2.

      3) Data presentation sometimes does not do the work justice. For example, the data in Figure 6 are very interesting, but hard to read because of the design of the figure. It is clear how the components are mostly confined to Aldolase-C domains, but within the domains the distribution is not clear. I would advise to also more clearly indicate what the locations of the colors within the bands refers to. The spatial distribution of the selected top 300 cells for each TC could be added.

      We added pie-chart plots for the fraction of TC1-4 neurons in each Ald-C zone and learning stage. We also indicated in the figure legend that the location of a single-color bar referred to the geographic distance of the corresponding neuron relative to Ald-C boundaries. We included spatial distribution of the selected neurons in Figure 4 – figure supplement 1D.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports a study to investigate the reporting practices in three top cardiovascular research journals for articles published in 2019. The study was preregistered, which makes the intent and methodology transparent, and the authors also make their materials, data, and code open. While the preregistration and sample strategy is a strength, it suffers from a higher than expected number of non-empirical articles decreasing the sample size and thus inference that can be drawn. The author's focus was mainly on transparency of reporting and not on the actual reproducibility or replicability of the articles; however, the accessibility of data, code, materials, and methods is a prerequisite. While the authors were still able to draw inferences to their main objectives, they could not perform some of their proposed analyses because of a small sample size (due partly to the less than half empirical articles in their sample as well as the low number of papers with accessible information to code). One of the descriptive analyses they performed, the country level scores (Figure 6), in particular suffers from the small sample size and while the authors state indicates this in their manuscript I do not think it would be reasonable to include as it has the potential to be misinterpreted since so many are based on an n=1. Overall, I found the authors presentation and discussion clear and concise; however, a lack of a more in-depth discussion is an area to improve the current manuscript. The manuscript outlines opportunities for researchers, journals, funders, and institutions to improve the way cardiovascular research is reported to enable discovery, reuse, and reproducibility.

      We appreciate the reviewer’s recognition of our pre-registration, methodology, and resource sharing and also their feedback regarding the small sample size of empirical research articles and need for a more in-depth discussion of the impacts of our study. We have now increased the number of empirical studies to a total of 393 out of 639 articles screened. We also agree that our study focuses more on transparency than reproducibility and replicability, and we have changed our title to reflect this. While the sample size of empirical papers has increased, a comparison of accessibility scores across countries continued to suffer from small sample size and we have removed it based on the recommendation of the reviewers. We have updated the Materials and Methods section to reflect our updated analyses, as well as included additional paragraphs on Limitations and Future Work in our Discussion to acknowledge future improvements that could be made to the accessibility score used in our study.

      Reviewer #2 (Public Review):

      This is a descriptive paper in the field of metascience, which documents levels of accessibility and reproducible research practices in the field of cardiovascular science. As such, it does not make a theoretical contribution, but it argues, first, that there is a problem for this field, and second, it provides a baseline against which the impact of future initiatives to improve reproducibility can be assessed. The study was pre-registered and the methods and data are clearly documented. This kind of study is extremely labour-intensive and represents a great deal of work.

      I have a major concern about the analysis. It is stated that to be fully reproducible, publications must include sufficient resources (materials, methods, data and analysis scripts). But how about cases where materials are not required to reproduce the work? In line 128-129 it is noted that the materials criterion was omitted for meta-analyses, but what about other types of study where materials may be either described adequately in the text, readily available (eg published questionnaires), or impossible to share (e.g. experimental animals).

      To see how valid these concerns might be, I looked at the first 4 papers in the deposited 'EmpricalResearchOnly.csv' file. Two had been coded as 'No Materials availability statement' and for two the value was blank.

      Study 1 used registry data and was coded as missing a Materials statement. The only materials that I could think might be useful to have might be 'standardized case report forms' that were referred to. But the authors did note that the Registry methods were fully documented elsewhere (I am not sure if that is the case).

      Study 2 was a short surgical case report - for this one the Materials field was left blank by the coder.

      Study 3 was a meta-analysis; the Materials field was blank by the coder

      Study 4 was again coded as lacking a Material statement. It presented a model predicting outcome for cardiac arrhythmias. The definitions of the predictor variables were provided in supplementary materials. I am not clear what other materials might be needed.

      These four cases suggest to me that it is rather misleading to treat lack of a Materials statement as contributing to an index of irreproducibility. Certainly, there are many studies where this is the case, but it will vary from study to study depending on the nature of the research. Indeed, this may also be true for other components of the irreproducibility index: for instance, in a case study, there may be no analysis script because no statistical analysis was done. And in some papers, the raw data may all be present in the text already - that may be less common, but it is likely to be so for case studies, for instance.

      A related point concerns the criteria for selecting papers for screening: it was surprising that the requirement for studies to have empirical data was not imposed at the outset: it should be possible to screen these out early on by specifying 'publication type'; instead, they were included and that means that the numbers used for the actual analysis are well below 400. The large number of non-empirical papers is not of particular relevance for the research questions considered here. In the Discussion, the authors expressed surprise at the large number of non-empirical papers they found; I felt it would have been reasonable for them to depart from their pre registered plan on discovering this, and to review further papers to bring the number up to 400, restricting consideration to empirical papers only - also excluding case reports, which pose their own problems in this kind of analysis.

      A more minor point is that some of the analyses could be dropped. The analysis of authorship by country had too few cases for many countries to allow for sensible analysis.

      Overall, my concern is that the analysis presented here may create a backlash against metascientific analyses like this because it appears unfair on authors to use a metric based on criteria that may not apply to their study. I am strongly in favour of open, reproducible science, and agree it is important to document the state of the science for different disciplines. But what this study demonstrates to me is that if you are going to evaluate papers as to whether they include things like materials/data/ availability statements, then you need to have a N/A option. Unfortunately, I suspect it may not be possible to rely on authors' self-evaluation of N/A and that means that metascientists doing an evaluation would need to read enough of the paper to judge whether such a statement should apply.

      We thank the reviewer for the time taken to review our paper, the appreciation of the work we conducted, and for the suggestions for improving our research methods. To address the initial concern about our analytical approach, the definition for fully reproducible publications that we used was only applicable to research that utilized empirical research methods. We recognize that publications such as editorials and reviews are not inherently reproducible experimental studies; thus, such papers were not provided with an accessibility score, were only screened for the components such as funding and conflict of interest information, and were only compared amongst each other. Additionally, articles such as meta-analyses and systematic reviews that do not include materials had adjusted accessibility scores. We expanded our Methods and Discussion section to further explain our screening process and our assumption that all empirical research articles contain methods, data, and analysis scripts and to acknowledge the limitations of our approach. We also agree that screening more empirical research articles is more in line with the intent of our pre-registration and we expanded the number of empirical research articles screened to 393. We also agree with the reviewer that the analysis by country should be excluded because of the small sample size for most countries, and we have adjusted the manuscript accordingly.

    1. Reviewer #1 (Public Review):

      The authors present a back-of-the-envelope exploration of various possible resource allocation strategies for ITNs. They identify two optimal strategies based on two slightly different objective functions and compare 3 simple strategies to the outcomes of the optimal strategies and to each other. The authors consider both P falciparum and P vivax and explore this question at the country level, using 2000 prevalence estimates to stratify countries into 4 burden categories.

      This is a relevant question from a global funder perspective, though somewhat less relevant for individual countries since countries are not making decisions at the global scale. The authors have made various simplifications to enable the identification of optimal strategies, so much so that I question what exactly was learned. It is not surprising that strategies that prioritize high-burden settings would avert more cases. Generally, I found much of the text confusing and some concepts were barely explained, such that the logic was difficult to follow.

      I am not sure why the authors chose to stratify countries by 2000 PfPR estimates and in essence explore a counterfactual set of resource allocation strategies rather than begin with the present and compare strategies moving forward. I would think that beginning in 2020 and modeling forward would be far more relevant, as we can't change the past. Furthermore, there was no comparison with allocations and funding decisions that were actually made between 2000 and 2020ish so the decision to begin at 2000 is rather confusing.

      I realize this is a back-of-the-envelope assessment (although it is presented to be less approximate than it is, and the title does not reveal that the only intervention strategy considered is ITNs) but the number and scope of modeling assumptions made are simply enormous. First, that modeling is done at the national scale, when transmission within countries is incredibly heterogeneous. The authors note a differential impact of ITNs at various transmission levels and I wonder how the assumption of an intermediate average PfPR vs modeling higher and lower PfPR areas separately might impact the effect of the ITNs. Second, the effect of ITNs will differ across countries due to variations in vector and human behavior and variation in insecticide resistance and susceptibility to the ITNs. The authors note this as a limitation but it is a little mind-boggling that they chose not to account for either factor since estimates are available for the historical period over which they are modeling. Third, the assumption that elimination is permanent and nothing is needed to prevent resurgence is, as the authors know, a vast oversimplification. Since resources will be needed to prevent resurgence, it appears this assumption may have a substantial impact on the authors' results.

      The decision to group all settings with EIR > 7 together as "high transmission" may perhaps be driven by WHO definitions but at a practical level this groups together countries with EIR 10 and EIR 500. Why not further subdivide this group, which makes sense from a technical perspective when thinking about optimal allocation strategies?

      The relevance of this analysis for elimination is a little questionable since no one eliminates with ITNs alone, to the best of my understanding.

    1. Author Response

      Reviewer #3 (Public Review):

      Because of the position of pigeon embryos in eggs, light exposure will only stimulate the right eye, leading to lateralisation of brain responses and behaviour. Lorenzi and colleagues injected manganese chloride into pigeon eggs, to assess neuronal activation in the embryonic brain. While the eggs were placed in the light or dark, manganese ions accumulated in neurons that were activated (in cell bodies and axons), which was then visualized with MRI of the embryos before hatching. The authors report lateralisation of neuronal activity in three brain regions, which could potentially be important for our understanding of experience-dependent development of lateralised neural activation.

      The tectofugal pathway in pigeons projects from the retina to the optical tectum, then to the nucleus rotundus in the thalamus, and then to the entopallium. The thalamofugal pathway projects from the retina to the GLd in the thalamus, and then to the wulst in the hyperpallium. The two pathways involve different thalamic nuclei (e.g., Deng 2006). In the methods and throughout the manuscript it should be specified which thalamic region is used as ROI.

      Here we refer to the Gld in the thalamofugal visual pathway, we did not estimate activity in the n. rotundus. We have now clarified this point in the revised MS (ll. 54, 80, 86).

      This manuscript only describes neural activity, but the MEMRI technique should also be used to assess the effect of experimental manipulations on axonal connectivity. It is important to learn about the asymmetry of contralateral projections in the light vs dark groups for answering the research question.

      Here we used systemic administration of Mn through the CAM. The Blood Brain Barrier at this embryonic stage is not completely developed and its permeability to ions and small molecules is way higher in embryo than in later stages of development (Engelhardt, B. (2003). Development of the blood-brain barrier. Cell and tissue research, 314(1), 119-129.). Other studies involving direct, local injection in selected brain regions are more apt to investigate connectivity, but this is not the protocol used here. We appreciate the reviewer’s suggestion, and this will be the object of future experiments. However, we would like to disseminate the current protocol and the results it led to at an early stage to enable and encourage its use by other researchers in the field.

      There is an overinterpretation of post-hoc statistics that are reported without correction for multiple testing. The wulst light group lateralization is probably not actually different from zero (uncorrected p=0.04).

      We considered the reviewer's observation regarding the need for improvements in the statistical methods. In response, we have made amendments to the relevant section of the manuscript, explicitly stating that significant findings were obtained using a two-way ANOVA. For comparisons between conditions within specific brain regions, we conducted two-sample t-tests, and the results were corrected for Type I errors using the false discovery rate (FDR) method. Post-hoc one-sample t-tests were employed to assess lateralization across brain regions and conditions, and the corresponding p-values were reported without correction for multiple comparisons (as explicitly reported in the text, to avoid any confusion).

      The first line in the discussion states that there is thalamofugal lateralization, but no lateralization in the tectofugal pathway. To my understanding, previous literature reported it the other way around: in altricial pigeons, light exposure in the egg mainly affected the tectofugal pathway (Deng & Rogers 2002), while the thalamofugal pathway in pigeons was not lateralized (Strockens et al., 2013). The manuscript should compare the current findings with the literature and discuss differences.

      We are aware of the substantial differences in brain lateralization of the two visual pathways between pigeons and chicks after embryonic light exposure. However, in the present work we employed chick embryos (Gallus gallus domesticus), and the space limitations of a Brief Communication do not allow for an in-depth discussion of these differences between avian species.

      Moreover, the tectum is the only region shown here from the tectofugal pathway. However, lateralization of contralateral connections is expected from tectum to the nucleus rotundus in the thalamus, and thus lateralization of activation may only arise in downstream brain regions from the optical tectum. Therefore, the conclusion that there is no lateralization in the tectofugal pathway is not supported by the data.

      In conclusion, I think it is interesting and worthwhile that the authors assessed neural activity in response to visual stimulation in the embryo prior to hatching, but multiple methodological weaknesses and unclarities should be addressed.

      The ROI that we here named Thalamus does not include the nucleus rotundus, but is referring to the nucleus geniculatus lateralis (Gld). We have now clarified this point in the revised MS (ll. 54, 80, 86), and we now refer only to the tectum, without generalizing to the entire tectofugal pathway, which will be the subject of future investigations.

    1. Reviewer #3 (Public Review):

      There has been a long-standing link between the biology of sulfur-containing molecules (e.g., hydrogen sulfide gas, the amino acid cysteine, and its close relative cystine, et cetera) and the biology of hypoxia, yet we have a poor understanding of how and why these two biological processes and are co-regulated. Here, the authors use C. elegans to explore the relationship between sulfur metabolism and hypoxia, examining the regulation of cysteine dioxygenase (CDO1 in humans, CDO-1 in C. elegans), which is critical to cysteine catabolism, by the hypoxia inducible factor (HIF1 alpha in humans, HIF-1 in C. elegans), which is the key terminal effector of the hypoxia response pathway that maintains oxygen homeostasis. The authors are trying to demonstrate that (1) the hypoxia response pathway is a key regulator of cysteine homeostasis, specifically through the regulation of cysteine dioxygenase, and (2) that the pathway responds to changes in cysteine homeostasis in a mechanistically distinct way from how it responds to hypoxic stress.

      Briefly summarized here, the authors initiated this study by generating transgenic animals expressing a CDO-1::GFP protein chimera from the cdo-1 promoter so that they could identify regulators of CDO-1 expression through a forward genetic screen. This screen identified mutants with elevated CDO-1::GFP expression in two genes, egl-9 and rhy-1, whose wild-type products are negative regulators of HIF-1, raising the possibility that cdo-1 is a HIF-1 transcriptional target. Indeed, the authors provide data showing that cdo-1 regulation by EGL-9 and RHY-1 is dependent on HIF-1 and that regulation by RHY-1 is dependent on CYSL-1, as expected from other published findings of this pathway. The authors show that exogenous cysteine activates cdo-1 expression, reflective of what is known to occur in other systems. Moreover, they find that exogenous cysteine is toxic to worms lacking CYSL-1 or HIF-1 activity, but not CDO-1 activity, suggesting that HIF-1 mediates a survival response to toxic levels of cysteine and that this response requires more than just the regulation of CDO-1. The authors validate their expression studies using a GFP knockin at the cdo-1 locus, and they demonstrate that a key site of action for CDO-1 is the hypodermis. They present genetic epistasis analysis supporting a role for RHY-1, both as a regulator of HIF-1 and as a transcriptional target of HIF-1, in offsetting toxicity from aberrant sulfur metabolism. The authors use CRISPR/Cas9 editing to mutate a key amino acid in the prolyl hydroxylase domain of EGL-9, arguing that EGL-9 inhibits CDO-1 expression through a mechanism that is largely independent of the prolyl hydroxylase activity.

      Overall, the data seem rigorous, and the conclusions drawn from the data seem appropriate. The experiments test the hypothesis using logical and clever molecular genetic tools and design. The sample size is a bit lower than is typical for C. elegans papers; however, the experiments are clearly not underpowered, so this is not an issue. The paper is likely to drive many in the field (including the authors themselves) into deeper experiments on (1) how the pathway senses hypoxia and sulfur/cysteine/H2S using these distinct mechanisms/modalities, (2) how oxygen and sulfur/cysteine/H2S homeostasis influence one another, and (3) how this single pathway evolved to sense and respond to both of these stress modalities.

      Major strengths of the paper include (1) the use of the powerful whole animal C. elegans model to reveal results that have meaning in vivo, (2) the careful demonstration through mutant rescue experiments that key transgenes have functional activity, (3) the use of CRISPR/Cas9 editing to mutate a critical residue in the catalytic domain of the EGL-9 prolyl hydroxylase, (4) transgenic rescue experiments that show that CDO-1 operates in the hypodermis with regard to the larval arrest phenotype, and (5) the thorough epistatic analysis of different pathway mutants.

      Major weaknesses of the paper include (1) the over-reliance on genetic approaches, (2) the lack of novelty regarding prolyl hydroxylase-independent activities of EGL-9, and (3) the lack of biochemical approaches to probe the underlying mechanism of the prolyl hydroxylase-independent activity of EGL-9.

      Major Issues We Feel the Authors Should Address:

      1. One particularly glaring concern is that the authors really do not know the extent to which the prolyl hydroxylase activity is (or is not) impacted by the H487A mutation in egl-9(rae276). If there is a fair amount of enzymatic activity left in this mutant, then it complicates interpretation. The paper would be strengthened if the authors could show that the egl-9(rae276) eliminates most if not all prolyl hydroxylase activity. In addition, the authors may want to consider doing RNAi for egl-9 in the egl-9(rae276) mutant as a control, as this would support the claim that whatever non-hydroxylase activity EGL-9 may have is indeed the causative agent for the elevation of CDO-1::GFP. Without such experiments, readers are left with the nagging concern that this allele is simply a hypomorph for the single biochemical activity of EGL-9 (i.e., the prolyl hydroxylase activity) rather than the more interesting, hypothesized scenario that EGL-9 has multiple biochemical activities, only one of which is the prolyl hydroxylase activity.

      2. The authors observed that EGL-9 can inhibit HIF-1 and the expression of the HIF-1 target cdo-1 through a combination of activities that are (1) dependent on its prolyl hydroxylase activity (and subsequent VHL-1 activity that acts on the resulting hydroxylated prolines on HIF-1), and (2) independent of that activity. This is not a novel finding, as the authors themselves carefully note in their Discussion section, as this odd phenomenon has been observed for many HIF-1 target genes in multiple publications. While this manuscript adds to the description of this phenomenon, it does not really probe the underlying mechanism or shed light on how EGL-9 has these dual activities. This limits the overall impact and novelty of the paper.

      3. Cysteine dioxygenases like CDO-1 operate in an oxygen-dependent manner to generate sulfites from cysteine. CDO-1 activity is dependent upon availability of molecular oxygen; this is an unexpected characteristic of a HIF-1 target, as its very activation is dependent on low molecular oxygen. Authors neither address this in the text nor experimentally, and it seems a glaring omission.

      4. The authors determined that the hypodermis is the site of the most prominent CDO-1::GFP expression, relevant to Figure 4. This claim would be strengthened if a negative control tissue, in the animal with the knockin allele, were shown. The hypodermal specific expression is a highlight of this paper, so it would make this article even stronger if they could further substantiate this claim.

      Minor issues to note:

      Mutants for hif-1 and cysl-1 are sensitive to exogenous cysteine levels, yet loss of CDO-1 expression is not sufficient to explain this phenomenon, suggesting other targets of HIF-1 are involved. Given the findings the authors (and others) have had showing a role for RHY-1 in sulfur amino acid metabolism, shouldn't the authors consider testing rhy-1 mutants for sensitivity to exogenous cysteine?

      The cysteine exposure assay was performed by incubating nematodes overnight in liquid M9 media containing OP50 culture. The liquid culture approach adds two complications: (1) the worms are arguably starving or at least undernourished compared to animals grown on NGM plates, and (2) the worms are probably mildly hypoxic in the liquid cultures, which complicates the interpretation.

      An easily addressable concern is the wording of one of the main conclusions: that cdo-1 transcription is independent of the canonical prolyl hydroxylase function of EGL-9 and is instead dependent on one of EGL-9's non-canonical, non-characterized functions. There are several points in which the wording suggests that CDO-1 toxicity is independent of EGL-9. In their defense, the authors try to avoid this by saying, "EGL-9 PHD," to indicate that it is the prolyl hydroxylase function of EGL-9 that is not required for CDO-1 toxicity. However, this becomes confusing because much of the field uses PHD and EGL-9/EGLN as interchangeable protein names. The authors need to be clear about when they are describing the prolyl hydroxylase activity of EGL-9 rather than other (hypothesized) activities of EGL-9 that are independent of the prolyl hydroxylase activity.

      The authors state in the text, "the egl-9; suox-1 double mutants are extremely sick and slow growing." We appreciate that their "health" assay, based on the exhaustion of food from the plate, is qualitative. We also appreciate that it is a functional measure of many factors that contribute to how fast a population of worms can grow, reproduce, and consume that lawn of food. However, unless they do a lifespan assay and/or measure developmental timing and specifically determine that the double mutant animals themselves are developing and/or growing more slowly, we do not think it is appropriate to use the words "slow growing" to describe the population. As they point out, the rate of consumption of food on the plate in their health assay is determined by a multitude and indeed a confluence of factors; the growth rate is one specific one that is commonly measured and has an established meaning.

    1. Neither Spread of U.S. Slavery nor Invasion of America uses language explicitly condemning slavery or imperialism, allowing the map’s usage by potentially racist and xenophobic visitors. The objective, socially-neoliberal portrayal of data without subjectivity perpetuates color-blind racism and allows bigotry to take root.

      I think this may be precisely because these maps are scholarly maps. Members of academia tend to avoid making a "subjective" or "biased" argument, especially regarding historial matters. On the other hand, non-scholarly maps created bottom-up through community engagement (such as the Anti-Eviction Mapping Project referenced in Data Feminism) can more explicitly call out injustices. I want to learn more about the ways in which we can complement the limitations of scholarly mapping projects.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their comments. We have now addressed all the comments in a revised version of the manuscript, which we believe has strengthened our paper.

      1) Introduction LINE 60: the authors cite Funato et al 2016 as the paper first describing a role for SIk3 in sleep regulation. In fact, the role for this kinase was first identified nearly a decade earlier in C. elegans (Van der Linden et al, Genetics 2008 PMID 18832350).

      Thank you for pointing us to this reference. Van der Linden et al. demonstrated that the C. elegans homolog of Sik3 (KIN-29) regulates satiety quiescence, in which worms stop moving following feeding on high quality food. However, as pointed out in Trojanowski and Raizen “Call it Worm Sleep” (2016), not all of the behavioral criteria for sleep has been applied to C. elegans satiety quiescence, and we cannot find any references that unequivocally demonstrate satiety quiescence is a sleep state. As McClanahan et al., (2020) show, quiescent states following mild sensory arousal do not fulfill the sleep criteria of changes in arousal threshold and homeostatic regulation, so not all quiescent states in C. elegans are sleep. Then again Grubbs et al, 2020 does demonstrate that KIN29 regulates both developmentally timed and stress induced sleep states in worms, suggesting that the observations in Van der Linden were ahead of its time and these behavioral states are possibly inter-related. We believe, though, that our line “the roles of… SIK3 kinase in modulating sleep homeostasis in mice (Funato et al. 2016) were identified in genetic screens” remains accurate.

      2) Introduction LINE 71: remove the word "known" from "...while some known human sleep/wake regulators, such as the...")

      Good idea. Done.

      3) I was confused regarding Supplemental data 1 describing the genes they targeted with their forward genetic screen. Am I understanding correctly from the "Summary stats" tab that 702 fish lines with virus insertions were screened behaviorally? In Figure S1, it looks like about 60 are shown in the histograms but in the text (in the Discussion) they say 25 were screened. Were all the genes listed under the Excel tabs (GPCRs, channels, etc) tested? Or was just a subset tested? Where are the sleep data for these lines? Negative results may be relevant to their manuscript since they listed (tested??) a number of ion channel genes under tab "channels" which appear to NOT have a sleep phenotype.

      We apologize for the confusion on these points. As highlighted in the legend to Supplementary Figure S1, we had planned a screening strategy with the following pipeline: Candidate mammalian gene → Zebrafish ortholog → ID viral insertion from “Zenemark” library → grow viral insertion lines from frozen sperm→ phenotype F3 heterozygous and homozygous mutant generation. Unfortunately, the company, Znomics, which held the Zenemark library, could not reliably reconstitute the correct live fish from the sperm library, and of the 702 lines we planned to screen, we could only screen 26 (25 was a typo) lines. We treated heterozygous and homozygous animals for each line independently, for a total of 52 screened lines in the histograms.

      To make this clearer, we have edited the main text as follows (lines 104-105): “For screening, we identified zebrafish sperm samples from the Zenemark collection (Varshney et al., 2013) that harboured viral insertions in genes of interest and used these samples for in vitro fertilization and the establishment of F2 families, which we were able to obtain for 26 lines.” And lines 111-112: “While most screened heterozygous and homozygous lines had minimal effects on sleep-wake behavioural parameters (Figure S1B-S1C),”

      We believe it is important to include the full set of Supplementary Data 1, even though the vast majority of these candidate lines were not tested.

      4) Results LINE 117: remove the word "prominent", which is subjective, from the sentence "...showed a prominent decrease in sleep during the..."

      Good point. Done.

      5) LINES 185-186: did you see any circadian variation in your dmist:GFP protein abundance or localization? Protein trafficking has been described as a mechanism of circadian regulation of excitability.

      For practical reasons, we imaged the membrane localization of Dmist:GFP in plasmidinjected embryos at 90% epiboly, which is about 9 hours after fertilization and when the cells remain large and in a relatively flat epithelium. Thus, we could not follow circadian fluctuations in abundance or localization. For circadian studies, we believe the best method will be to raise an antibody that recognizes Dmist.

      6) LINE 203: does the GFP-tagged Dmist rescue the loss-of-function phenotype? This is relevant to Figure 2E. it is also relevant to the issue of structure-function. If it rescues, then the C-terminus may not be essential to protein function.

      As noted, for practical reasons, we observed Dmist-GFP only transiently at early stages of development, expressed using a strong, ubiquitous promoter. A rescue experiment is a good idea for future experiments, where we carefully control the expression of Dmist in neurons.

      7) LINE 220: explain what you mean by "...consistent with nonsense-mediated decay." and/or give a reference.

      In zebrafish and other species including humans, mutant transcripts that have premature stop codons often undergo “nonsense mediated decay”, whereby the expression levels are largely reduced (Wittkopp et al., 2009). In the zebrafish community, this is often used as secondary evidence of a loss of function mutation, as relatively few antibodies are available to directly observe zebrafish proteins. We have added a reference that describes this phenomenon (Wittkopp et al., 2009).

      8) LINE 225: define "LME model"

      Now reads: “Linear mixed effects (LME).”

      9) LINES 227-229: could the vir/vir phenotype be explained by specific effects on protein structure? could vir/vir be a gain-of-function allele?

      We can’t rule this out formally, and vir/+ animals do show some sleep phenotypes, albeit weaker than those of vir/vir animals (Figure 1G). However, it is not uncommon for heterozygous mutants to show significant phenotypes that are weaker than those of their homozygous mutant siblings, and the strong suppression of dmist expression by the viral insertion (which is located in the dmist intron) is more consistent with a hypomorphic loss-of-function phenotype for the vir allele.

      10) LINES 229-230: I don't quite follow the argument for pursuing further studies only of i8/i8. i8/i8 seems to also be a hypomorphic allele based on your qPCR data.

      First, the dmist viral line was generated by an insertional mutagenesis method followed by sequencing, and each line has multiple other inserts in a background that does not match the background of the other animals reported in this paper. Second, the dmist vir allele is an insertion in the intron, leading to reduced, but not complete loss of expression. In contrast, the i8 allele was generated on the same background strain as our other existing and newly reported lines. Moreover, our i8 line is likely a loss-of-function allele and not a hypomorph. Yes, dmist expression is reduced in the i8 allele; however, this is likely due to nonsense mediated decay of dmist mRNA. The mutation introduces a frameshift in the dmist coding sequence, and as a result the amino acid sequence of the protein is altered after the N-terminal signal sequence.

      11) LINES 241-243: grammar.

      Fixed

      12) LINE 245: define "JackHMMR iterative search"

      We’ve added the phrase: “and seeding a hidden Markov model iterative search (JackHMMR)”

      13) LINE 246 is missing the word "we" prior to "...found distant homology between..."

      Added

      14) LINE 301: show data demonstrating deviation from Mendelian ratios. Also, comment on meaning of such data (embryonic lethality??).

      We have added this data in the line (301):

      “atp1a3b mutant larvae were not obtained at Mendelian ratios (55 wild type [52.5 expected], 142 [105] atp1a3b+/-, 13 [52.5] atp1a3b-/-; p<0.0001, Chi-squared) suggesting some impact on early stages of development leading to lethality.”

      15) Discussion LINES 362-372: This paragraph seems to be of only tangential relevance to the paper. Consider removing.

      Our screening strategy was a large-scale reverse genetic screen, but the number of lines was limited by the technical issues described above. We think it is important to mention that the strategy, if employed today, could benefit from newer technologies.

      16) Discussion. Another model is that Dmist and NaK pump have a developmental effect. Arguing against this developmental model is the Oubain expt.

      This is an important point. We’ve added the line (454:457): “We also cannot exclude a role for Dmist and the Na+/K+ pump in developmental events that impact sleep, although our observation that ouabain treatment, which inhibits the pump acutely after early development is complete, also impacts sleep, argues against a developmental role.”

      17) FIGURE 1G: Are these significance cut offs corrected for multiple comparisons?

      Yes, all the data is corrected for multiple comparisons.

      18) performing neuronal activity measures, either via neural activity imaging or phospho-ERK labeling in different mutants at day or night conditions, to determine whether baseline neuronal activity brain-wide or in specific brain regions are altered.

      These are excellent experiments that we plan to perform in the future.

      19) Please check all Figure numbers for accuracy.

      We have double checked these.

      20) The authors emphasize the role of increased cellular sodium, but equally plausibly, the phenotypes could be due to decreased cellular potassium. The potassium channel shaker has been previously identified as a critical sleep regulator in Drosophila.

      We completely agree. We would like to highlight that we did devote an entire paragraph to the possibility of changes in extracellular potassium in the discussion: “A third possibility is that Dmist and the Na+,K+-ATPase regulate sleep not by modulation of neuronal activity per se but rather via modulation of extracellular ion concentrations. Recent work has demonstrated that interstitial ions fluctuate across the sleep/wake cycle in mice. For example, extracellular K+ is high during wakefulness, and cerebrospinal fluid containing the ion concentrations found during wakefulness directly applied to the brain can locally shift neuronal activity into wake-like states (Ding et al., 2016). Given that the Na+,K+-ATPase actively exchanges Na+ ions for K+ , the high intracellular Na+ levels we observe in atp1a3a and dmist mutants is likely accompanied by high extracellular K+. Although we can only speculate at this time, a model in which extracellular ions that accumulate during wakefulness and then directly signal onto sleep-regulatory neurons could provide a direct link between Na+,K+ ATPase activity, neuronal firing, and sleep homeostasis. Such a model could also explain why disruption of fxyd1 in non-neuronal cells also leads to a reduction in night-time sleep.”

      We also agree that Shaker may be an important component of this sleep regulatory mechanism. Indeed, we previously showed that another potassium channel in zebrafish regulates sleep (Rihel et al., 2010).

      We have emphasized sodium homeostasis in our title and paper only because we were able to directly observe intracellular sodium levels, so we are confident that these have been altered in our mutants. We can only presume that potassium levels have also been altered, but we could not directly observe this.

      21) The similar phenotype between dmist and Fxyd1 in sleep reduction yet very different expression patterns, with dmist being mostly neuronal while fxyd1 being mostly non-neuronal, raise many possible questions: 1) are the sleep phenotypes due to neuronal Na/K imbalance? Or 2) Are the sleep phenotypes due to extracellular Na/K imbalance? Or 3) both? Some feasible experiments may help achieve a better mechanistic understanding of the observed sleep defects.

      Yes, we think these are excellent studies for future work. As noted in the previous point (20), we did discuss the possibility that changes to extracellular potassium might be a parsimonious explanation for the similar phenotypes of fxyd1 and dmist mutants.

      Future experiment suggestions (not required)

      1) Perform a double mutant analysis of fxyd1 and atp1a3a, to determine whether an epistatic relationship similar to that of dmist and atp1a3a is observed in the case of fxyd1 and atp1a3a.

      This is a great experiment that we will do in the future. Unfortunately, the fxyd1 mutant had been sperm frozen during the COVID-19 pandemic, so we cannot do this experiment at this time.

      2) Given the differences in the sleep phenotypes between vir/vir and i8/i8 mutants, would be informative to see the phenotype of the vir/i8 trans-heterozygote.

      This is also a good experiment to perform in the future. Since obtaining the cleaner i8 allele, the dmistvir/vir lines were sperm frozen.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary: This study by Magalhaes et al sheds light on the molecular underpinnings of the relative resistance of children to severe COVID-19. The authors found that priming of epithelial cells by resident immune cells to express tonic levels PRR receptors MDA-5 and RIG-I predisposes the epithelial cells for a faster and more robust onset of IFN-beta production upon SARS-CoV-2 infection. The study uses a combination of in vitro and ex vivo models, as well as mining of scRNA-Seq datasets from clinical specimens.

      Major comments: The claims and conclusions are supported by the data and therefore no new experiments are needed.

      Optional

      1. The use of primary cells (i.e. human airway epithelial cultures cross talking to immune cells) would make this study more compelling, although I assume that the major findings would be recapitulated in such models.
      2. It is not clear how the use of Yersinia enterocolitica to trigger activation of PBMC is relevant to this story. Using different (commensal) pathogens to achieve PBMC activation may yield different and more physiologically relevant results.
      3. The manuscripts would greatly benefit from improved structure and focus, particularly in the Abstract, Introduction and Results sections. The text is very dense, and makes it difficult for the reader to follow the flow and to distinguish important from less important information. Particularly, the introduction starts very broadly introducing COVID-19, which I think we are by now all familiar with. Directly starting with the burning question why kids get less sick with SARS-CoV-2 would capture the readers' attention better. Figure 1 a is beautiful for a review but much too dense to help the reader as a graphical abstract. In the results section, for each experiment, leading with clearly stating the rationale of the specific question, the gap in knowledge and why the gap is there, then followed by the results, then summarizing the impact of said results, would make this a much more enjoyable read and help the reader evaluate the novelty and impact better, particularly for Figures 1, 2, and 3 (but also all others). The interaction wheel graphs (Figure 4. are amazing, but are not properly explained in the text (do I read this right that in adults, all the crosstalk is basically performed by proliferating T-cells?). In all, these scientific writing issues sell an otherwise beautiful story short.

      Referees cross-commenting

      I agree with reviewers 1 and 2 that the use of primary cells would significantly elevate the story. However, I think this should be "optional", as I do not think it would change the findings.

      Significance

      General assessment:

      The main strength of the study are its topic and clearly relevant question: why do kids rarely get severe COVID-19? The main novelty is the answer to this question, that immune cell-epithelial crosstalk in children elevates the tonic expression of MDA5 and RIG-I via the IRF1 axis, leading to faster onset of IFN production and signaling upon SARS-CoV-2 challenge, which ultimately mounts an antiviral response detrimental to robust SARS-CoV-2 replication. The study uses an innovative combination of in vivo and ex vivo experiments and analysis of clinical specimens.

      The significant advance of this study to the field is clear to this reviewer, although it could be much better stated in the manuscript, as described at length above. The study is of great interest to the field of immunology and virology, and also has clinical and translational impact with respect to risk assessment for severe COVID-19 per age group, as well as epidemiological considerations for infection control.

    1. Author Response:

      We would like to thank the eLife reviewers for the considerable time and effort they have invested to review these manuscripts. We have also benefited from a previous round of review of the manuscript describing the proposed burial features, which underwent two rounds of revisions in a high-impact journal over a period of approximately 8 months during 2022 and early 2023. Both sets of reviews have reflected mixed responses to the evidence we have presented, with one reviewer recommending acceptance with minor editorial revisions, two recommending acceptance with minor revisions and the fourth recommending rejection based upon similar arguments to those reflected by some of the reviewers in this current round of reviews in eLife. Ultimately the managing editor of this first journal took the decision that the review process could not be completed in a timely manner and rejected the manuscript although the submission here reflected our consideration of these reviewers suggestions.

      We have chosen in this initial response to the eLife reviews to include some references to the previous anonymous reviews in order to illustrate differences of opinion and differences in revision suggestions within the review process. Our goal is to offer maximal insight into our decision-making process and to acknowledge the considerable time and effort put into the assessment of these manuscripts by reviewers (for eLife and in the case of the earlier review process). We hope that this approach will assist the readers, and reviewers, of our manuscripts in understanding why we are proceeding with certain decisions during the revision process.

      This is a new process for us and the reviewers, and one way in which it significantly differs from more traditional review is that both the reviews and our reply will be public well in advance of our revisions to the manuscript. Indeed, considering the scope of the reviews, some of those revisions may take considerable time, although many can be accomplished fairly easily. Thus, we are not in a position to say that we have solved every issue raised by the reviewers. Instead, we will examine what appear to be the key critical issues raised regarding the data and the analyses and how we propose to address these as we revise the papers. We will also address several philosophical and ethical issues raised by the reviews and our proposal for dealing with these. More specific editorial and citational recommendations will be dealt with on a case-by-case basis, and we do not address these point-by-point in this reply. Please note, this response to the reviewers is not the revision of the manuscript and is only the initial opinion of the corresponding authors with some guidance from the larger group of authors of all three papers. Our final submitted revision will reflect the input of all authors included on those submissions.

      We took the decision to submit three separate papers consciously. The two different categories of evidence, burials and engravings, involve different kinds of analysis and different (although overlapping) teams of researchers, and we recognized that each deserved their own presentation and assessment. Meanwhile, together they inform the context of H. naledi in a way that requires some synthetic discussion, in which both kinds of evidence are relevant, leading to a third paper. But the mutual relevance of these different kinds of evidence and their review by a common set of reviewers naturally raises cross-cutting issues, and the reviewers have cross-referenced the three articles. This has sometimes led to suggestions about one manuscript based on the contents of another. Considering the situation, we accepted the recommendation that it would be clearer to consider all three articles in a single reply. Thus, while each of the three papers will proceed separately during the revision process, it will be necessary to highlight across all three papers occasionally in our responses.

      Scientific Issues:

      In reading the reviews, we feel there are 9 critical points/assertions raised by one or more of the reviewers that present a problem for, or challenge to, our hypothesis that the observed evidence (bone accumulations and engravings) described in the Dinaledi subsystem are of intentional naledigenic origin. These are:

      1. The evidence presented does not demonstrate a clear interruption of the floor sediments, thus failing to demonstrate excavated holes.

      2. The sediments infilling the holes where the skeletal remains are found have not been demonstrated to originate from the disruption of the floor sediments and thus could be part of a natural geological process (e.g. water movement, slumping) or carnivore accumulations.

      3. Previous geological interpretations by our research group have given alternative geological explanations for formation of the bony accumulations that contradict the present evidence presented here and result in alternative origins hypotheses.

      4. Burial cannot be effectively assessed without complete excavation of the features and site.

      5. The skeletal remains as presented do not conform clearly to typical body arrangement/positions associated with human (Homo sapiens) burials.

      6. There is no evidence of grave goods or lithic scatters that are typically associated with human burials.

      7. Humans may have been involved with the creation of either the Homo naledi bone accumulations, the engravings, or both.

      8. Without a date of the engravings, the null hypothesis should be the engravings were created by Homo sapiens.

      9. The null hypothesis for explanation of the skeletal remains in this situation should be “natural accumulation”.

      Our analysis of the Dinaledi Feature 1 leads us to accept that the laminated orange-red mudstone (LORM) sedimentary layer is interrupted, indicating a non-natural intervention, and that the hole created by the interruption was then filled by both a fleshed body (and perhaps parts of other bodies) which were then covered by sediment that originated from the hole that was dug. We recognize that the four eLife reviewers are not convinced that our presentation is sufficient to establish this. Interestingly, this was not the universal opinion of earlier reviewers of the initial manuscript several of whom felt we had adequately supported this hypothesis. The lack of clarity in this current version of the burial manuscript is our responsibility. In the upcoming revision of this paper to be submitted, we will take the reviewers’ critiques to heart and add additional figures that illustrate better the disruption of the LORM and clarify the sedimentological data showing the material covering the skeletal remains in the hole are the disrupted sediments excavated from the same hole. We are proposing to isolate this most critical evidence for burial into a separate section in the revised submission based on the reviewers’ comments. The fact that the LORM layer is disrupted, a fleshed body was placed in the hole created by this disruption, and the body (and perhaps parts of other bodies) was/were then covered by the same sediments from the hole is the central feature of our hypothesis that the bone accumulations observed reflect a burial and not a natural process.

      The possibility of fluvial transport or involvement in the subsystem is a topic that we have addressed extensively in past work, and it is clear from these reviews that we must enhance our current manuscript to discuss this issue at greater length. Our previous work (Dirks et al. 2015; Dirks et al. 2017) emphasized that fluvial transport of whole bodies into the subsystem was precluded by several lines of sedimentological evidence. We excavated a rich accumulation of skeletal remains, including articulated limbs and other elements in subvertical orientations inconsistent with slow sedimentary infill, which were difficult to explain without positing either a large and dense pile of bodies and/or sediment movement. We encountered fractured chunks of laminated orange-red mudstone (LORM) in random orientations within our excavation area, within and among skeletal remains, which directly refuted that the remains were inundated with water at the time of burial, and this limited the possibility of fluvial transport. Water flow sufficient to displace bodies or complete skeletal evidence would also transport large and course sediment, which is absent from the subsystem, and would sort the commingled skeletal material that we found by size, which we do not observe. But our excavation only covered less than a square meter at very limited depth, and this was the limit to our knowledge of subsurface sediment. We thus were left with uncertainty that led us to suggest the possibility of sediment slumping or movement into subsurface drains, although these were not observed near our excavation. Our current work expands our knowledge of the subsurface and presents an alternative explanation for the disposition of skeletal remains from our earlier excavation. But we acknowledge that this new explanation is vulnerable to our own previous published proposals, and we must do a better job of explaining how the new information addresses our previous suggestions. By not clearly creating a section where we explained how these previous hypotheses were now nullified by new evidence, we clearly confused the reviewers with our own previous work. We will revise the manuscript by enhancing the review of the significant geological evidence demonstrating that there is no significant fluvial action in the system and making it clear how the burial hypothesis provides a clearer explanation for the situation of skeletal remains from our previous excavation work.

      One of the central issues raised by reviewers has been a perceived need to excavate these features completely, totally exhuming all skeletal remains from them. Reviewers have written that it is necessary to identify every skeletal element that is present and account for any missing elements. On this point, we have both ethical and scientific differences from these reviewers. We express our ethical concerns first. Many of the best-preserved possible burials ever discovered by archaeologists were subjected to total excavation and exhumation. Cases like La Chapelle-aux-Saints, La Ferrassie, and Skhūl were fully excavated at a time when data recording and excavation methods did not include the range of spatial and geomorphological approaches that later became routine. The judgment of early investigators that these situations were intentional burials was challenged by later workers, and the kind of information that might enable better tests had been irrevocably lost (Gargett 1999; Dibble et al. 2015; Rendu et al. 2014).

      Later, improved excavation standards have not sufficed to remove uncertainty or debate about possible burials. For example, it was long presumed that well-preserved remains of young children were by themselves diagnostic of intentional burial, such as those from Dederiyeh, Border Cave, or Roc de Marsal. Such cases were also fully excavated, with adequate documentation of the positioning of skeletal remains and their surrounding stratigraphic situation, but such cases were later challenged on several bases and the complete exhumation of material has confused or precluded testing of new hypotheses (e.g. Gargett 1999). The case of Roc de Marsal is one in which data from the initial excavation combined with data from the initial excavation combined with re-excavation and geoarchaeological analysis led to a naturalistic interpretation of the skeletal material (Sandgathe et al. 2011; Goldberg et al. 2017). But even in this case, the researchers erred in their interpretation of the skeleton’s situation due to a lack of identification of parts of the infant’s skeleton (Gómez-Olivencia and García-Martinez 2019). That is to say, it is not only the burial hypothesis but other hypotheses that suffer from complete excavation. Researchers concerned with preserving all possible information have sometimes taken extraordinary measures to remove and study possible burials at high-resolution in the laboratory. Such was the case of the Shanidar IV burial removed from the site and transported in plaster jacket by Solecki, which led to the disruption and loss of internal stratigraphic information (Pomeroy et al. 2020). Arguably, the current state of the art is full excavation with partial preparation, such as that undertaken at Panga ya Saidi (Martinón-Torres et al. 2021). But again, any future attempt to reinterpret or test the hypothesis of burial must rely on the adequacy of documentation as the original context has been removed.

      In our decision to leave material in place as much as possible, we are expanding upon standard practice to leave witness sections and unexcavated areas for future research. The situation is novel, representing possible burials by a nonhuman species, and that makes it doubly important in our opinion to be conservative in not fully exhuming the skeletal material from its context. We anticipate that many other researchers, including future investigators, will suggest additional methods to further test the hypothesis of burial, something that would be impossible if we had excavated the features in their entirety prior to publishing a description of our work. We believe strongly that our ethical responsibility is to publish the work and the most likely interpretation while leaving as much evidence in place as possible to enable further testing and replication. We welcome the suggestions of additional methods/analyses to test the H. naledi burial hypothesis.

      This being said, we also observe that total exhumation would not resolve the concerns raised by the reviewers. The recommendation of total exhumation is in pursuit of a full account of all skeletal material present and its preservation and spatial situation, in order to demonstrate that they conform to body positions comparable to human burials. As has been highlighted in forensic casework, the excavation of an inhumation feature does not necessarily provide an accurate spatial or anatomical manifest of the stratigraphical relationships between the body, encapsulating matrix, and any cut present due to preservational, taphonomic and operational factors (Dirkmaat and Cabo, 2016; Hunter, 2014). In particular, in cases where skeletal elements are highly fragmented, friable, or degraded (such as through bioerosion) then complete excavation—even under controlled laboratory conditions—may destroy bone and severely limit skeletal identification (Henderson, 1997; Hochrein, 2002; Owsley and Compton, 1997), particularly in elements where the ratio of trabecular to cortical bone is high (Darwent and Lyman, 2002; Lyman, 1994). As such, non-invasive methods of 3D and 4D modelling (preservation in situ) are often considered preferable to complete necropsy or excavation (preservation by record) where appropriate (Bolliger and Thali, 2009; Dell’Unto and Landeschi, 2022; Randolph-Quinney et al., 2018; Silver, 2016). 

      The test of burial is not primarily positional, but taphonomic and geological. The position and number of bones can elaborate on process-driven questions of decay and destruction in the burial environment, or post-mortem modification, but are not singularly indicative of whether the remains were intentionally buried – the post-mortem narrative of all the processes affecting the cadaveric island is required (Knüsel and Robb, 2016). In previous cases, researchers have disputed or accepted the hypothesis of intentional hominin burial based upon assumptions about how modern humans or Neandertals would have positioned bodies, with the idea that some positions reflect ritual intent while others do not. But applying such assumptions is unjustifiable, particularly for a species like H. naledi, whose culture may have differed fundamentally from our own. Our work acknowledges that the present evidence does not enable a full reconstruction of the burial positions, but it does show that fleshed remains were encased in sediment prior to decomposition of soft tissue, and that subsequent spatial changes can be most parsimoniously explained by natural decomposition within sedimentary matrix contained within a burial feature (after Green, 2022; Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022). If the argument is that extraordinary claims require extraordinary evidence, we feel that the evidence documents excavation and interment (and will do so more clearly in the revision) and the fact of the remains do not match a “typical” human burial in body positioning is not in itself evidence that these are not H. naledi burials.

      We feel that the reviewers (in keeping with many palaeoanthropologists) have a clear idea of what they “think” a burial should look like in an idealised sense, but this platonic ideal of burial form is not matched by the extensive literature in archaeothanatology, funerary archaeology and forensic science which indicates enormous variability in the activity, morphology and post-mortem system experienced by the human body in cases of interment and body disposal (e.g. Aspöck, 2008; Boulestin and Duday, 2005 and 2006; Connelly et al., 2005; Channing and Randolph-Quinney, 2006; Cherryson, 2008; Donnelly et al., 1995; Finley, 2000; Hunter, 2014; Parker Pearson, 1999; Randolph-Quinney, 2013). Decades of experience in the identification, recovery and interpretation of clandestine, deviant, and non-formal burials indicates the platonic ideal is rare, and in many contexts, the exception (Cherryson, 2008; Parker Pearson, 1999). This variability is particularly relevant to morphological traits in burial context, such as the informal nature of the grave cut in plan and section, shallow burial depth, and initial disposition of body (placement) during the early post-mortem period. These might run counter to the expectations of reviewers or others referencing the fossil hominin record, but are well accepted within the communities of researchers investigating Holocene archaeological sites and forensic contexts.

      It is encouraging to see reviewers beginning to incorporate the extensive (often experimentally derived) literature from archaeothanatology and forensic taphonomy in their deliberations, and we will be taking these comments on board going forward. In particular, we acknowledge reviewers’ comments and the need to construct a more detailed post-mortem narrative, accounting for joint disarticulation (labile versus persistent joints etc), displacement, and final disposition of elements within the burial space. As such we will incorporate the hierarchy of decomposition (rank order disarticulation), associations between regions of anatomical association, areas of disassociation, and the voids produced during decomposition (after Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022) into our narrative. In doing so we acknowledge the tensions between the inductive archaeolothanatological narrative-driven approach (e.g. Duday, 2005 & 2009) versus robust decomposition data derived from human forensic taphonomic experimentation recently articulated by Schotsmans and colleagues (2022) - noting that we will highlight comparative data based on forensic experimental casework and actualistic modelling over inductive intuitive approaches which come with significant evidential shortcomings (Bristow et al. 2011).

      Finally, from a taphonomic perspective it is worth pointing out to reviewers that we have already addressed the issue of lack of taphonomic evidence for carnivore involvement in the formation of the Dinaledi assemblage (Dirks, et al., 2016). Absence of any carnivore-induced bone surface modifications, patterns of skeletal part representation, and a total absence of any carnivore remains found within the Dinaledi chamber (following Kuhn and colleagues, 2010) lead us to reject carnivores as possible vectors of body accumulation within the Dinaledi Chamber and Hill Antechamber.

      Reviewers suggest that without a date derived from geochronological methods, the engravings cannot be associated with H. naledi, and that it is possible (or probable) that the engravings were done in the recent past by H. sapiens. This suggestion neglects the context of the site. We have previously documented the structure and extremely limited accessibility of the Dinaledi subsystem. This subsystem was not recorded on maps of the documented Rising Star Cave system prior to our work and its discovery by our teams. Furthermore, there is no evidence of prehistoric human activity in the areas of the cave related to possible subterranean entrances There is no evidence that humans in the past typically ventured into such extreme spaces like those of Rising Star. It is clear from the presence of the remains of many individuals that H. naledi ventured into these spaces again and again. It is likely that H. naledi moved through these spaces more easily than humans do based on their physique. We show that the engravings overlay each other suggesting multiple engraving events.  These engravings took time and effort and the only evidence for use of the Dinaledi subsystem by any hominin is by H. naledi. The context leads to the null hypothesis that H. naledi made the marks. In our revision, we will elaborate on this argument to clarify the evidence for our stance on this hypothesis. Several reviewers took issue with the title of the engraving paper as we did not insert a qualifier in front of the suggested date range for the engravings. We deliberately left out qualifying language so that the title took the form of a testable hypothesis rather than a weak assertation. Should future work find the engravings were not produced within this time range, then we will restate this hypothesis.

      Finally, with regards to the engravings we have chosen to report them because they exist. Not reporting the presence of engraved marks on the walls of a cave above hypothesized burials would be tantamount to leaving relevant evidence out of the description of an archeological context. We recognize and state in our manuscript that these markings require substantial further study, including attempts at geochronological dating. But the current evidence is clearly relevant to the archaeological context of the subsystem. We take a similar stance with reporting the presence of the tool shaped artefact near the hand of the H. naledi skeleton in the Hill Antechamber. It is evident that this object requires further study, as we stated in our manuscript, but again omitting it from our study would be leaving out relevant evidence.

      Some have suggested that the null hypothesis should be that all of these observed circumstances are of natural origin. Our team took this approach in our early investigation of the Dinaledi subsystem (Dirks et al. 2015). We adopted the null hypothesis that the geological processes involved in the accumulation of H. naledi skeletal remains were “natural” (e.g., non-naledigenic involvement), and we were able to reject many alternative explanations for the assemblage, including carnivore accumulation, “death trap” accumulation, and fluvial transport of bodies or bones (Dirks et al. 2015). This led us to the hypothesis that H. naledi were involved in bringing the bodies into the spaces where they were found. But we did not hypothesize their involvement in the formation of the deposit itself beyond bringing the bodies to the location.

      This approach seems conservative. It followed the traditional view that small-brained hominins do not engage in cultural practices. But we recognize in hindsight that this null hypothesis approach did harm to our analyses. It impeded us from recognizing within our initial excavations of the puzzle box area and other excavations between 2014 – 2017 that we might be encountering remains that were intrusive in the sedimentary floor of the chamber. If we had approached the accumulation of a large number of hominins from the perspective of the null hypothesis being that the situation was likely cultural, we perhaps would have collected evidence in a slightly different manner. We certainly note that if the Dinaledi system had been full of the remains of modern humans, there would have been little doubt that the null hypothesis would have been that this was a cultural space and not a “natural space”.  We therefore respectfully disagree with the reviewers who continue to support the idea that we should approach hominin excavations with the null hypothesis that they will be natural (specifically non-cultural) in origins. If excavations continue with this mindset we believe that potential cultural evidence is almost certain to be lost.

      There has been a gradient across paleoanthropological excavations, archaeological work, and forensic investigation, with increasing precision of context. The reality is that the recording precision and frame of approach is typically different in most paleontological excavations than in those related to contemporary human remains. If anything comes from the present discussion of whether the Dinaledi system is a burial site for H. naledi or not, we hope that by taking seriously the possibility of deep cultural dynamics of hominins, we will encourage other teams to meet the highest standards of excavation in order to preserve potential cultural evidence. Given H. naledi’s cranial capacity we suggest that even very early hominin skeletal assemblages should be re-examined, if there is sufficient evidence or records available.  These would include examples such as the A.L. 333 Au. afarensis site (the so called First Family site in Hadar Ethiopia), the Dikika infant skeleton, WT 15000 (Turkana Boy) and even A.L. 288 (Lucy) as such unusual taphonomic situations where skeletons are preserved cannot be simply explained away as “natural” in origin, based solely on the cranial capacity and assumed lack of cognitive and cultural complexity of the hominins as emphasized by us in Fuentes et al. (2023). We are not the first to observe that some very early hominin situations may represent early mortuary activity (Pettitt 2013), but we would advocate a step further. We suggest it may be damaging to take “natural accumulation” as the standard null hypothesis for hominin paleoanthropology, and that it is more conservative in practice to engage remains with the null hypothesis of possible cultural formation.

      We are deeply grateful for the time and effort all of the 8 reviewers (across three reviews) have taken with this work.  We also acknowledge the anonymous reviewers from previous submissions who’s opinions and comments will have made the final iterations of these manuscripts better for their efforts. As this process is rather public and includes commentary outside of the eLife forum, we ask that the efforts of all 37 authors and 8 reviewers involved be respected and that the discourse remain professional in all venues as we study this fascinating and quite complex occurrence. We appreciate also the efforts of members of the public who have engaged with this relatively new process where preprints are posted prior to the reviews allowing comments and interactions from colleagues and the public who are normally not part of the internal peer review process.  We believe these interactions will make for better final papers. We feel we have met the standards of demonstrating burials in H. naledi and that the engraving are most likely associated with H. naledi. However, given the reviews we see many areas where our clarity and context, and analyses, were less strong than they can be. With the clarifications and additions taken on board through these review processes the final papers will be stronger and clearer. We, recognize that this is an ongoing process of scientific investigation and further work will allow continued, and possibly better, evaluation of these hypothesis and others.

      Lee R Berger, Agustín Fuentes, John Hawks, Tebogo Makhubela

      Works cited:

      • Aspöck, E. (2008). What Actually is a ‘Deviant Burial’?: Comparing German-Language and Anglophone Research on ‘Deviant Burials.’ In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books.  pp 17–34.

      • Bolliger, S.A. & Thali, M.J. (2009). Thanatology. In S.A. Bolliger and M.J. Thali (eds) Virtopsy Approach:  3D Optical and Radiological Scanning and Reconstruction in Forensic Medicine. Boca Raton: CRC Press. pp 187-218.

      • Boulestin, B. & Duday, H. (2005). Ethnologie et archéologie de la mort: de l’illusion des références à l’emploi d’un vocabulaire. In: C. Mordant and G. Depierre (eds) Les Pratiques Funéraires à l’Âge du Bronze en France. Actes de la table ronde de Sens-en-Bourgogne. Paris: Éditions du Comité des Travaux Historiques et Scientifiques. pp. 17–30.

      • Boulestin, B. & Duday, H. (2006). Ethnology and archaeology of death: from the illusion of references to the use of a terminology. Archaeologia Polona 44: 149–169.

      • Bristow, J., Simms, Z. & Randolph-Quinney, P.S. Taphonomy. In S. Black and E. Ferguson (eds.) Forensic Anthropology 2000-2010. Boca Raton, FL: CRC Press. pp 279-318.

      • Channing, J. & Randolph-Quinney, P.S. (2006). Death, decay and reconstruction: the archaeology of Ballykilmore Cemetery, County Westmeath. In J. O’Sullivan and M. Stanley (eds.) Settlement, Industry and Ritual: Archaeology. National Roads Authority Monograph Series No. 3. Dublin: NRA/Four Courts Press. pp 113-126.

      • Cherryson, A. K. (2008). Normal, Deviant and Atypical: Burial Variation in Late Saxon Wessex, c. AD 700–1100. In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books. pp 115–130.

      • Connolly, M., F. Coyne & L. G. Lynch (2005). Underworld : Death and Burial in Cloghermore Cave, Co. Kerry. Bray, Co. Wicklow: Wordwell.

      • Darwent, C. M. & R. L. Lyman (2002). Detecting  the postburial fragmentation of carpals, tarsals and phalanges. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press. pp 355-378.

      • d’Errico, F., & Backwell, L. (2016). Earliest evidence of personal ornaments associated with burial: The Conus shells from Border Cave. Journal of Human Evolution, 93, 91–108.

      • De Villiers. H. (1973). Human skeletal remains from Border Cave, Ingwavuma District, KwaZulu, South Africa. Annals of the Transvaal Museum, 28(13), 229–246.

      • Dell’Unto, N. and Landeschi, G. (2022). Archaeological 3D GIS. London: Routledge.

      • Dibble, H. L., Aldeias, V., Goldberg, P., McPherron, S. P., Sandgathe, D., & Steele, T. E. (2015). A critical look at evidence from La Chapelle-aux-Saints supporting an intentional Neandertal burial. Journal of Archaeological Science, 53, 649–657.

      • Dirkmaat, D. C., & Cabo, L. L. (2016). Forensic archaeology and forensic taphonomy: basic considerations on how to properly process and interpret the outdoor forensic scene_. Academic Forensic Pathology_ 6, 439–454.

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. ELife, 4, e09561.

      • Dirks, P.H.G.M., Berger, L.R., Hawks, J., Randolph-Quinney, P.S., Backwell, L.R., and Roberts, E.M. (2016). Comment on “Deliberate body disposal by hominins in the Dinaledi Chamber, Cradle of Humankind, South Africa?” [J. Hum. Evol. 96 (2016) 145-148]. Journal of Human Evolution 96:  149-153.

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. ELife, 6, e24231.

      • Donnelly, S., C. Donnelly & E. Murphy (1999). The forgotten dead: The cíllíní and disused burial grounds of Ballintoy, County Antrim. Ulster Journal of Archaeology 58, 109-113.

      • Duday, H. (2005). L’archéothanatologie ou l’archéologie de la mort. In: O. Dutour, J.-J. Hublin and B. Vandermeersch (eds) Objets et Méthodes en Paléoanthropologie. Paris: Comité des Travaux Historiques et Scientifiques. pp. 153–215.

      • Duday, H. (2009). Archaeology of the Dead: Lectures in Archaeothanatology. Oxford: Oxbow Books.

      • Finley, N. (2000). Outside of life: Traditions of infant burial in Ireland from cillin to cist.  World Archaeology 31, 407-422.

      • Gargett, R. H. (1999). Middle Palaeolithic burial is not a dead issue: The view from Qafzeh, Saint-Césaire, Kebara, Amud, and Dederiyeh. Journal of Human Evolution, 37(1), 27–90.

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015.

      • Gómez-Olivencia, A., & García-Martínez, D. (2019). New postcranial remains from the Roc de Marsal Neandertal child. PALEO. Revue d’archéologie Préhistorique, 30–1, 30–1.

      • Green, E.C. (2022). An archaeothanatological approach to the identification of late Anglo-Saxon burials in wooden containers. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 436-455.

      • Henderson, J. (1987). Factors determining the state of preservation of human remains. In A. Boddington, A. Garland and R. Janaway (eds). Death, Decay and Reconstruction: Approaches to Archaeology and Forensic Science. Manchester: Manchester University Press. pp 43-54.

      • Hunter, J. R. (2014). Human remains recovery: archaeological and forensic perspectives. In C. Smith (ed). Encyclopedia of Global Archaeology. New York: Springer New York. pp 3549-3556.

      • Hochrein, M. (2002). An Autopsy of the Grave: Recognizing, Collecting and Preserving Forensic Geotaphonomic Evidence. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press: 45-70.

      • Knüsel, C.K. & Robb, J. (2016). Funerary taphonomy: An overview of goals and methods. Journal of Archaeological Science: Reports 10, 655-673.

      • Kuhn, B.F., Berger, L.R. & Skinner, J.D. (2010). Examining criteria for identifying and differentiating fossil faunal assemblages accumulated by hyenas and hominins using extant hyenid accumulations. International Journal of Osteoarchaeology 20, 15-35.

      • Lyman, R. (1994). Vertebrate Taphonomy. Cambridge, Cambridge University Press.

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), 7857.

      • Mickleburgh, H.L & Wescott, D.J. (2018). Controlled experimental observations on joint disarticulation and bone displacement of a human body in an open pit: implications for funerary archaeology. Journal of Archaeological Science: Reports 20: 158-167.

      • Mickleburgh, H.L., Wescott, D.J., Gluschitz, S. & Klinkenberg, V.M. (2022). Exploring the use of actualistic forensic taphonomy in the study of (forensic) archaeological human burials: An actualistic experimental research programme at the Forensic Anthropology Center at Texas State University (FACTS), San Marcos, Texas. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 542-562.

      • Owsley, D. & B. Compton (1997). Preservation in late 19th Century iron coffin burials. In W. Haglund and M. Sorg (eds). Forensic Taphonomy: The Postmortem Fate of Human Remains. Boca Raton, FL, CRC Press: 511-526.

      • Parker Pearson, M. (1999). The Archaeology of Death and Burial. College Station: Texas A&M University Press.

      • Pettitt, P. (2013). The Palaeolithic Origins of Human Burial. Routledge.

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26.

      • Randolph-Quinney, P.S. (2013). From the cradle to the grave: the bioarchaeology of Clonfad 3 and Ballykilmore 6. In N. Brady, P. Stevens and J. Channing (eds.). Settlement and Community in the Fir Tulach Kingdom. Dublin: National Roads Authority Press. pp A2.1-48.

      • Randolph-Quinney, P.S., Haines, S. and Kruger, A. (2018). The use of three-dimensional scanning and surface capture methods in recording forensic taphonomic traces: issues of technology, visualisation, and validation. In: W.J. M. Groen and P. M. Barone (eds). Multidisciplinary Approaches to Forensic Archaeology. Berlin: Springer International Publishing, pp. 115-130.

      • Rendu, W., Beauval, C., Crevecoeur, I., Bayle, P., Balzeau, A., Bismuth, T., Bourguignon, L., Delfour, G., Faivre, J.-P., Lacrampe-Cuyaubère, F., Tavormina, C., Todisco, D., Turq, A., & Maureille, B. (2014). Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints. Proceedings of the National Academy of Sciences, 111(1), 81–86.

      • Sandgathe, D. M., Dibble, H. L., Goldberg, P., & McPherron, S. P. (2011). The Roc de Marsal Neandertal child: A reassessment of its status as a deliberate burial. Journal of Human Evolution, 61(3), 243–253.

      • Silver, M. (2016). Conservation Techniques in Cultural Heritage. In E. Stylianidis and F. Remondino (eds) 3D Recording, Documentation and Management of Cultural Heritage. Dunbeath: Whittles Publishing. pp 15-106.

      • Schotsmans, E.M.J., Georges-Zimmermann, P., Ueland, M. and Dent, B.B. (2022). From flesh to bone: Building bridges between taphonomy, archaeothanatology and forensic science for a better understanding of mortuary practices. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 501-541.

    1. Reviewer #3 (Public Review):

      Lee Berger and colleagues argue here that markings they have found in a dark isolated space in the Rising Star Cave system are likely over a quarter of a million years old and were made intentionally by Homo naledi, whose remains nearby they have previously reported. As in a European and much later case they reference ('Neanderthal engraved 'art' from the Pyrenees'), the entangled issues of demonstrable intentionality, persuasive age and likely authorship will generate much debate among the academic community of rock art specialists. The title of the paper and the reference to 'intentional designs', however, leave no room for doubt as to where the authors stand, despite avoidance of the word art, entering a very disputed terrain. Iain Davidson's (2020) 'Marks, pictures and art: their contributions to revolutions in communication', also referenced here, forms a useful and clearly articulated evolutionary framework for this debate. The key questions are: 'are the markings artefactual or natural?', 'how old are they?' and 'who made them?, questions often intertwined and here, as in the Pyrenees, completely inseparable. I do not think that these questions are definitively answered in this paper and I guess from the language used by the authors (may, might, seem etc) that they do not think so either.

      First, a few referencing issues: the key reference quoted for distinguishing natural from artefactual markings (Fernandez-Jalvo et al. 2014), whilst mentioned in the text, is not included in the references. In the acknowledgements, the claim that "permits to conduct research in the Rising Star Cave system are provided by the South African National Research Foundation" should perhaps refer rather to SAHRA? In the primary description of their own markings from Rising Star and their presumed significance, there are, oddly, several unacknowledged quotes from the abstract of one of the most significant European references (Rodriguez-Vidal et al. 2014). These need attention.

      Before considering the specific arguments of the authors to justify the claims of the title, we should recognise the shift in the academic climate of those concerned with 'ancient markings' that has taken place over the past two or three decades. Before those changes, most specialists would probably have expected all early intentional markings to have been made by Homo sapiens after the African diaspora as part of the explosion of innovative behaviours thought to characterise the 'origins of modern humans'. Now, claims for earlier manifestations of such innovations from a wider geographic range are more favourably received, albeit often fiercely challenged as the case for Pyrenean Neanderthal 'art' shows (White et al. 2020). This change in intellectual thinking does not, however, alter the strict requirements for a successful assertion of earlier intentionality by non-sapiens species. We should also note that stone, despite its ubiquity in early human evolutionary contexts, is a recalcitrant material not easily directly dated whether in the form of walling, artefact manufacture or potentially meaningful markings. The stakes are high but the demands are no less so.

      Why are the markings not natural? Berger and co-authors seem to find support for the artefactual nature of the markings in their location along a passage connecting chambers in the underground Rising Star Cave system. The presumption is that the hominins passed by the marked panel frequently. I recognise the thinking but the argument is weak. More confidently they note that "In previous work researchers have noted the limited depth of artificial lines, their manufacture from multiple parallel striations, and their association into clear arrangement or pattern as evidence of hominin manufacture (Fernandez-Jalvo et al. 2014)". The markings in the Rising Star Cave are said to be shallow, made by repeated grooving with a pointed stone tool that has left striations within the grooves and to form designs that are "geometric expressions" including crosshatching and cruciform shapes. "Composition and ordering" are said to be detectable in the set of grooved markings. Readers of this and their texts will no doubt have various opinions about these matters, mostly related to rather poorly defined or quantified terminology. I reserve judgement, but would draw little comfort from the similarities among equally unconvincing examples of early, especially very early, 'designs'. Two or even three half-convincing arguments do not add up to one convincing one.

      The authors draw our attention to one very interesting issue: given the extensive grooving into the dolomite bedrock by sharp stone objects, where are these objects? Only one potential 'lithic artefact' is reported, a "tool-shaped rock [that] does resemble tools from other contexts of more recent age in southern Africa, such as a silcrete tool with abstract ochre designs on it that was recovered from Blombos Cave (Henshilwood et al. 2018)", also figured by Berger and colleagues. A number of problems derive from this comparison. First, 'tool-shaped rock' is surely a meaningless term: in a modern toolshed 'tool-shaped' would surely need to be refined into 'saw-shaped', 'hammer-shaped' or 'chisel-shaped' to convey meaning? The authors here seem to mean that the Rising Star Cave object is shaped like the Blombos painted stone fragment. But the latter is a painted fragment, not a tool and so any formal similarity is surely superficial and offers no support to the 'tool-ness' of the Rising Star Cave object. Does this mean that Homo naledi took (several?) pointed stone tools down the dark passageways, used them extensively and, whether worn out or still usable, took them all out again when they left? Not impossible, of course. And the lighting?

      The authors rightly note that the circumstance of the markings "makes it challenging to assess whether the engravings are contemporary with the Homo naledi burial evidence from only a few metres away" and more pertinently, whether the hominins did the markings. Despite this honest admission, they are prepared to hypothesise that the hominin marked, without, it seems, any convincing evidence. If archaeologists took juxtaposition to demonstrate authorship, there would be any number of unlikely claims for the authorship of rock paintings or even stone tools. The idea that there were no entries into this Cave system between the Homo naledi individuals and the last two decades is an assertion, not an observation, and the relationship between hominins and designs no less so. In fact, the only 'evidence' for the age of the markings is given by the age of the Homo naledi remains, as no attempt at the, admittedly very difficult, perhaps impossible, task of geochronological assessment, has been made.

      The claims relating to artificiality, age and authorship made here seem entangled, premature and speculative. Whilst there is no evidence to refute them, there isn't convincing evidence to confirm them.

      References:

      • Davidson, I. 2020. Marks, pictures and art: their contribution to revolutions in communication. Journal of Archaeological Method and Theory 27: 3 745-770.

      • Henshilwood, C.S. et al. 2018. An abstract drawing from the 73,000-year-old levels at Blombos Cave, South Africa. Nature 562: 115-118.

      • Rodriguez-Vidal, J. et al. 2014. A rock engraving made by Neanderthals in Gibralter. Proceedings of the National Academy of Sciences.

      • White, Randall et al. 2020. Still no archaeological evidence that Neanderthals created Iberian cave art.

    2. Reviewer #4 (Public Review):

      This is potentially a landmark study with far-reaching consequences for archaeology, palaeoanthropology, and more widely. The antiquity of intentional human mark marking is a hot topic but this study – understood as initial – has as yet incomplete sources of evidence and methods; and it will be interesting to follow how the study develops in subsequent studies.

      Strengths and points to build on:

      * Heuristic potential: As knowledge advances it poses a risk to accepted knowledge – and we should accept that one such risk is moving on from long-held disciplinary tenets. In this case, there has been a growing quantum of evidence – all hotly debated – for the deep antiquity of mark-making and even symbolism by species other than ourselves. Most researchers now accept Neanderthal symbolic capacity actualised in burials, intentional mark-making and the like. The evidence here presented is not unequivocal but is very suggestive and an ideal test case for applying multi-disciplinary techniques of analysis and interpretation beyond the expertise of the listed authors *see comments in 'weaknesses'). This work by itself may be equivocal but when taken together with other such work, points to a 'human' sensu lato past that is as complex as it is long. This work then helps all researchers to at least be alive to the possibility of things like anthropic marks and residues in a context not normally thought to have it.

      * Decentering speciesism: As per the above comment, I appreciate empirical studies that erode speciesism – in particular studies that open up our minds to the possibility that multiple members of the Genus Homo were capable of intentional mark-making and even 'symbolic' behaviour, though this latter term is not well understood or uniformly used. This is probably because of continuous unconscious bias on our part as currently the only exemplar of our genus living - in contrast to most of the past in which different species and genera co-existed - if not on the same landscape and/or at exactly the same time, then with enough overlap that people would have realised 'others' were about either by sight and/or by encountering their physical remains and artefacts.

      * Problematising 'firsts' and deep time: A strength – but which needs to be developed in this manuscript – is our understanding of time and change. We have a plethora of dating techniques but relatively few substantive monographs, articles, and think tanks on time – and especially on how change comes about and what causes it. This leads us to privilege 'firsts' and the 'oldest' finds in 'deep' time above those that are more recent and in 'shallow' time. I would suggest in addition to the claims for the oldest of the reported marks, the authors develop nascent remarks on the possibility the suite of marks may have been made over time. This will help counter criticism that these marks – if established to be anthropic – were not just a singularity, but part of patterned behaviour, which would move it towards the realm of 'symbolic' cognitive behaviour. And indeed, it would be good to hear more about why in this place, these marks were made to establish a replicable model for identifying early anthropic marks.

      Ultimately, this manuscript presents evidence that those who are pro the deep antiquity of intentional mark-making by Homo (and possibly even other genera) will find enough evidence to support; while those sceptical of such claims will find enough methodological flaws and evidential limits to refute those claims. The next decade of work will likely be definitive and this article makes a key contribution to the debate.

      Weaknesses and points to attend to:

      * Definitions: The term 'rock engraving' is used rather uncritically and also the term 'etching' – and it would be useful to have a short definition of how the authors understand the term. Rock art scholars regularly debate these terms and whether they are or are not 'rock art' with its overwhelmingly visual bias; which this discovery may usefully help overthrow and advance.

      * Dating: There is no evidence provided for dating the marks found in the cave system. They could, for example, have been made more recently than the dates claimed – and by another species (if we accept their anthropogenic authorship). This is a perennial problem of much rock art research – especially when it comes to understanding the wider archaeological/palaeoanthropological context. More crucially, accurate dating allows a more reliable understanding of authorship and who/what was responsible for a particular artefact or feature. This has not been demonstrated in this case, though we do have fossil evidence of Homo naledi in the cave system. The article title is this incorrect / and unsupported claim as the marks, if they are anthropic, have not been dated and are of unknown age. The authors allow that there may have been multiple episodes, but not that the marks can belong to a time other than they posit – either earlier, later, or distributed over a long period as the authors allow for in their concluding remarks.

      * Authorship: The study does not utilise either a geoscientist as one of the authorial team, or a rock art specialist. These are key oversights as the former would help better contextualise the dating of the marks reported on, as well as explore alternative non-anthropogenic agents that may have created the marks reported on. For example, the marks and 'pitting' etc may be the result of water bringing abrasive agents during times of flooding, hitting prominent rock features in the cave system. Some explanation is given from lines 114-124, but are uncited. The overlying 'sediment' may be similar to the mondmilch found in cave systems and which is of natural origin. It may be that these non-anthropogenic causes are easy to discount; but the arguments do need to be made. Or, that the polishing was made by Homo naledi brushing against the surfaces as they moved in the cave system, independent of any mark-making. A Table showing the pros and cons of intentional anthropic versus natural authorship would be very effective - as well as showing some of the natural linear marks in the cave system to avoid any confirmation or similar bias. FTIR analysis of the panel A-C would be more than useful to determine whether an additional layer of material has been added. This is mentioned for future work, but this seems a rather post-hoc research programme.

      * Use-wear analysis: If the marks are anthropic in origin; they are likely to have been made by a stone tool, which would leave characteristic marks, directionality and sequencing, distinct from natural causes. It is vital this work – such as was done on the Blombos engraved ochre – is done here – for example, linking to the chert and other tools described on lines 152-158. Note Figure 19, of such a tool, is very hard to make out. The Blombos – and Klasies River Mouth engraved ochres (curiously not referenced) – have very similar geometric markings and there is a real opportunity to compare these in securely dated contexts of 70-120 kya –which could support the argument made here for Homo naledi's cognitive capacity. On figure 16 it would be good to know on what basis some marks were selected as anthropic – and why others were not; this would help demonstrate the methodology and ability to distinguish between the two kinds of marks.

      * Viewshed: The rock art specialist would have added essential expertise on how to study anthropic marks. For example, the images of the marks shown are all of individual or small collections of motifs rather than showing each panel as well as all panels together, to help understand the iconographic context as an ensemble – a 'feature' rather than isolated 'artefacts' or 'motifs'. Line 60 mentions being able to see these as a 'triptych' but the reader is not able to have this view in this manuscript. From the cave map, it is not clear whether all three 'panels' (an unfortunate art historical term that suggests a framed entity - better to use a term like 'cluster') can be viewed simultaneously or in sequence. The view shed in relation to the area where the bodies were recovered is vaguely stated as 'only a few metres away' and is worth developing. I understand 3D scans have been made so it would be useful to have a version showing the marks in relation to where the bodies were recovered and as a 3-cluster ensemble.

      * Image enhancements: Also, in addition to polarised images, have colour enhancement tools like DStretch been tried to see if, for example, attempts at colouring with different coloured sands were made? Similarly, a 3D scan of the motif and panel – (Metashape is mentioned but not shown) – might assist in understanding how the marks and the rock they are on might relate to each other- as research in European upper Palaeolithic contexts has shown. Here, experimenting with different kinds of lighting - or in the absence of lighting, of tactility and how these marks and their rock support may have been experienced by those who may have made and interacted with them? As a note, it would be useful to have a scale in each image of the 'engravings' and it is a pity the one in situ photograph with the scale is not a standard rock art colour-corrected scale as is commonly used in rock art research.

    3. Author Response:

      We would like to thank the eLife reviewers for the considerable time and effort they have invested to review these manuscripts. We have also benefited from a previous round of review of the manuscript describing the proposed burial features, which underwent two rounds of revisions in a high-impact journal over a period of approximately 8 months during 2022 and early 2023. Both sets of reviews have reflected mixed responses to the evidence we have presented, with one reviewer recommending acceptance with minor editorial revisions, two recommending acceptance with minor revisions and the fourth recommending rejection based upon similar arguments to those reflected by some of the reviewers in this current round of reviews in eLife. Ultimately the managing editor of this first journal took the decision that the review process could not be completed in a timely manner and rejected the manuscript although the submission here reflected our consideration of these reviewers suggestions.

      We have chosen in this initial response to the eLife reviews to include some references to the previous anonymous reviews in order to illustrate differences of opinion and differences in revision suggestions within the review process. Our goal is to offer maximal insight into our decision-making process and to acknowledge the considerable time and effort put into the assessment of these manuscripts by reviewers (for eLife and in the case of the earlier review process). We hope that this approach will assist the readers, and reviewers, of our manuscripts in understanding why we are proceeding with certain decisions during the revision process.

      This is a new process for us and the reviewers, and one way in which it significantly differs from more traditional review is that both the reviews and our reply will be public well in advance of our revisions to the manuscript. Indeed, considering the scope of the reviews, some of those revisions may take considerable time, although many can be accomplished fairly easily. Thus, we are not in a position to say that we have solved every issue raised by the reviewers. Instead, we will examine what appear to be the key critical issues raised regarding the data and the analyses and how we propose to address these as we revise the papers. We will also address several philosophical and ethical issues raised by the reviews and our proposal for dealing with these. More specific editorial and citational recommendations will be dealt with on a case-by-case basis, and we do not address these point-by-point in this reply. Please note, this response to the reviewers is not the revision of the manuscript and is only the initial opinion of the corresponding authors with some guidance from the larger group of authors of all three papers. Our final submitted revision will reflect the input of all authors included on those submissions.

      We took the decision to submit three separate papers consciously. The two different categories of evidence, burials and engravings, involve different kinds of analysis and different (although overlapping) teams of researchers, and we recognized that each deserved their own presentation and assessment. Meanwhile, together they inform the context of H. naledi in a way that requires some synthetic discussion, in which both kinds of evidence are relevant, leading to a third paper. But the mutual relevance of these different kinds of evidence and their review by a common set of reviewers naturally raises cross-cutting issues, and the reviewers have cross-referenced the three articles. This has sometimes led to suggestions about one manuscript based on the contents of another. Considering the situation, we accepted the recommendation that it would be clearer to consider all three articles in a single reply. Thus, while each of the three papers will proceed separately during the revision process, it will be necessary to highlight across all three papers occasionally in our responses.

      Scientific Issues:

      In reading the reviews, we feel there are 9 critical points/assertions raised by one or more of the reviewers that present a problem for, or challenge to, our hypothesis that the observed evidence (bone accumulations and engravings) described in the Dinaledi subsystem are of intentional naledigenic origin. These are:

      1. The evidence presented does not demonstrate a clear interruption of the floor sediments, thus failing to demonstrate excavated holes.

      2. The sediments infilling the holes where the skeletal remains are found have not been demonstrated to originate from the disruption of the floor sediments and thus could be part of a natural geological process (e.g. water movement, slumping) or carnivore accumulations.

      3. Previous geological interpretations by our research group have given alternative geological explanations for formation of the bony accumulations that contradict the present evidence presented here and result in alternative origins hypotheses.

      4. Burial cannot be effectively assessed without complete excavation of the features and site.

      5. The skeletal remains as presented do not conform clearly to typical body arrangement/positions associated with human (Homo sapiens) burials.

      6. There is no evidence of grave goods or lithic scatters that are typically associated with human burials.

      7. Humans may have been involved with the creation of either the Homo naledi bone accumulations, the engravings, or both.

      8. Without a date of the engravings, the null hypothesis should be the engravings were created by Homo sapiens.

      9. The null hypothesis for explanation of the skeletal remains in this situation should be “natural accumulation”.

      Our analysis of the Dinaledi Feature 1 leads us to accept that the laminated orange-red mudstone (LORM) sedimentary layer is interrupted, indicating a non-natural intervention, and that the hole created by the interruption was then filled by both a fleshed body (and perhaps parts of other bodies) which were then covered by sediment that originated from the hole that was dug. We recognize that the four eLife reviewers are not convinced that our presentation is sufficient to establish this. Interestingly, this was not the universal opinion of earlier reviewers of the initial manuscript several of whom felt we had adequately supported this hypothesis. The lack of clarity in this current version of the burial manuscript is our responsibility. In the upcoming revision of this paper to be submitted, we will take the reviewers’ critiques to heart and add additional figures that illustrate better the disruption of the LORM and clarify the sedimentological data showing the material covering the skeletal remains in the hole are the disrupted sediments excavated from the same hole. We are proposing to isolate this most critical evidence for burial into a separate section in the revised submission based on the reviewers’ comments. The fact that the LORM layer is disrupted, a fleshed body was placed in the hole created by this disruption, and the body (and perhaps parts of other bodies) was/were then covered by the same sediments from the hole is the central feature of our hypothesis that the bone accumulations observed reflect a burial and not a natural process.

      The possibility of fluvial transport or involvement in the subsystem is a topic that we have addressed extensively in past work, and it is clear from these reviews that we must enhance our current manuscript to discuss this issue at greater length. Our previous work (Dirks et al. 2015; Dirks et al. 2017) emphasized that fluvial transport of whole bodies into the subsystem was precluded by several lines of sedimentological evidence. We excavated a rich accumulation of skeletal remains, including articulated limbs and other elements in subvertical orientations inconsistent with slow sedimentary infill, which were difficult to explain without positing either a large and dense pile of bodies and/or sediment movement. We encountered fractured chunks of laminated orange-red mudstone (LORM) in random orientations within our excavation area, within and among skeletal remains, which directly refuted that the remains were inundated with water at the time of burial, and this limited the possibility of fluvial transport. Water flow sufficient to displace bodies or complete skeletal evidence would also transport large and course sediment, which is absent from the subsystem, and would sort the commingled skeletal material that we found by size, which we do not observe. But our excavation only covered less than a square meter at very limited depth, and this was the limit to our knowledge of subsurface sediment. We thus were left with uncertainty that led us to suggest the possibility of sediment slumping or movement into subsurface drains, although these were not observed near our excavation. Our current work expands our knowledge of the subsurface and presents an alternative explanation for the disposition of skeletal remains from our earlier excavation. But we acknowledge that this new explanation is vulnerable to our own previous published proposals, and we must do a better job of explaining how the new information addresses our previous suggestions. By not clearly creating a section where we explained how these previous hypotheses were now nullified by new evidence, we clearly confused the reviewers with our own previous work. We will revise the manuscript by enhancing the review of the significant geological evidence demonstrating that there is no significant fluvial action in the system and making it clear how the burial hypothesis provides a clearer explanation for the situation of skeletal remains from our previous excavation work.

      One of the central issues raised by reviewers has been a perceived need to excavate these features completely, totally exhuming all skeletal remains from them. Reviewers have written that it is necessary to identify every skeletal element that is present and account for any missing elements. On this point, we have both ethical and scientific differences from these reviewers. We express our ethical concerns first. Many of the best-preserved possible burials ever discovered by archaeologists were subjected to total excavation and exhumation. Cases like La Chapelle-aux-Saints, La Ferrassie, and Skhūl were fully excavated at a time when data recording and excavation methods did not include the range of spatial and geomorphological approaches that later became routine. The judgment of early investigators that these situations were intentional burials was challenged by later workers, and the kind of information that might enable better tests had been irrevocably lost (Gargett 1999; Dibble et al. 2015; Rendu et al. 2014).

      Later, improved excavation standards have not sufficed to remove uncertainty or debate about possible burials. For example, it was long presumed that well-preserved remains of young children were by themselves diagnostic of intentional burial, such as those from Dederiyeh, Border Cave, or Roc de Marsal. Such cases were also fully excavated, with adequate documentation of the positioning of skeletal remains and their surrounding stratigraphic situation, but such cases were later challenged on several bases and the complete exhumation of material has confused or precluded testing of new hypotheses (e.g. Gargett 1999). The case of Roc de Marsal is one in which data from the initial excavation combined with data from the initial excavation combined with re-excavation and geoarchaeological analysis led to a naturalistic interpretation of the skeletal material (Sandgathe et al. 2011; Goldberg et al. 2017). But even in this case, the researchers erred in their interpretation of the skeleton’s situation due to a lack of identification of parts of the infant’s skeleton (Gómez-Olivencia and García-Martinez 2019). That is to say, it is not only the burial hypothesis but other hypotheses that suffer from complete excavation. Researchers concerned with preserving all possible information have sometimes taken extraordinary measures to remove and study possible burials at high-resolution in the laboratory. Such was the case of the Shanidar IV burial removed from the site and transported in plaster jacket by Solecki, which led to the disruption and loss of internal stratigraphic information (Pomeroy et al. 2020). Arguably, the current state of the art is full excavation with partial preparation, such as that undertaken at Panga ya Saidi (Martinón-Torres et al. 2021). But again, any future attempt to reinterpret or test the hypothesis of burial must rely on the adequacy of documentation as the original context has been removed.

      In our decision to leave material in place as much as possible, we are expanding upon standard practice to leave witness sections and unexcavated areas for future research. The situation is novel, representing possible burials by a nonhuman species, and that makes it doubly important in our opinion to be conservative in not fully exhuming the skeletal material from its context. We anticipate that many other researchers, including future investigators, will suggest additional methods to further test the hypothesis of burial, something that would be impossible if we had excavated the features in their entirety prior to publishing a description of our work. We believe strongly that our ethical responsibility is to publish the work and the most likely interpretation while leaving as much evidence in place as possible to enable further testing and replication. We welcome the suggestions of additional methods/analyses to test the H. naledi burial hypothesis.

      This being said, we also observe that total exhumation would not resolve the concerns raised by the reviewers. The recommendation of total exhumation is in pursuit of a full account of all skeletal material present and its preservation and spatial situation, in order to demonstrate that they conform to body positions comparable to human burials. As has been highlighted in forensic casework, the excavation of an inhumation feature does not necessarily provide an accurate spatial or anatomical manifest of the stratigraphical relationships between the body, encapsulating matrix, and any cut present due to preservational, taphonomic and operational factors (Dirkmaat and Cabo, 2016; Hunter, 2014). In particular, in cases where skeletal elements are highly fragmented, friable, or degraded (such as through bioerosion) then complete excavation—even under controlled laboratory conditions—may destroy bone and severely limit skeletal identification (Henderson, 1997; Hochrein, 2002; Owsley and Compton, 1997), particularly in elements where the ratio of trabecular to cortical bone is high (Darwent and Lyman, 2002; Lyman, 1994). As such, non-invasive methods of 3D and 4D modelling (preservation in situ) are often considered preferable to complete necropsy or excavation (preservation by record) where appropriate (Bolliger and Thali, 2009; Dell’Unto and Landeschi, 2022; Randolph-Quinney et al., 2018; Silver, 2016). 

      The test of burial is not primarily positional, but taphonomic and geological. The position and number of bones can elaborate on process-driven questions of decay and destruction in the burial environment, or post-mortem modification, but are not singularly indicative of whether the remains were intentionally buried – the post-mortem narrative of all the processes affecting the cadaveric island is required (Knüsel and Robb, 2016). In previous cases, researchers have disputed or accepted the hypothesis of intentional hominin burial based upon assumptions about how modern humans or Neandertals would have positioned bodies, with the idea that some positions reflect ritual intent while others do not. But applying such assumptions is unjustifiable, particularly for a species like H. naledi, whose culture may have differed fundamentally from our own. Our work acknowledges that the present evidence does not enable a full reconstruction of the burial positions, but it does show that fleshed remains were encased in sediment prior to decomposition of soft tissue, and that subsequent spatial changes can be most parsimoniously explained by natural decomposition within sedimentary matrix contained within a burial feature (after Green, 2022; Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022). If the argument is that extraordinary claims require extraordinary evidence, we feel that the evidence documents excavation and interment (and will do so more clearly in the revision) and the fact of the remains do not match a “typical” human burial in body positioning is not in itself evidence that these are not H. naledi burials.

      We feel that the reviewers (in keeping with many palaeoanthropologists) have a clear idea of what they “think” a burial should look like in an idealised sense, but this platonic ideal of burial form is not matched by the extensive literature in archaeothanatology, funerary archaeology and forensic science which indicates enormous variability in the activity, morphology and post-mortem system experienced by the human body in cases of interment and body disposal (e.g. Aspöck, 2008; Boulestin and Duday, 2005 and 2006; Connelly et al., 2005; Channing and Randolph-Quinney, 2006; Cherryson, 2008; Donnelly et al., 1995; Finley, 2000; Hunter, 2014; Parker Pearson, 1999; Randolph-Quinney, 2013). Decades of experience in the identification, recovery and interpretation of clandestine, deviant, and non-formal burials indicates the platonic ideal is rare, and in many contexts, the exception (Cherryson, 2008; Parker Pearson, 1999). This variability is particularly relevant to morphological traits in burial context, such as the informal nature of the grave cut in plan and section, shallow burial depth, and initial disposition of body (placement) during the early post-mortem period. These might run counter to the expectations of reviewers or others referencing the fossil hominin record, but are well accepted within the communities of researchers investigating Holocene archaeological sites and forensic contexts.

      It is encouraging to see reviewers beginning to incorporate the extensive (often experimentally derived) literature from archaeothanatology and forensic taphonomy in their deliberations, and we will be taking these comments on board going forward. In particular, we acknowledge reviewers’ comments and the need to construct a more detailed post-mortem narrative, accounting for joint disarticulation (labile versus persistent joints etc), displacement, and final disposition of elements within the burial space. As such we will incorporate the hierarchy of decomposition (rank order disarticulation), associations between regions of anatomical association, areas of disassociation, and the voids produced during decomposition (after Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022) into our narrative. In doing so we acknowledge the tensions between the inductive archaeolothanatological narrative-driven approach (e.g. Duday, 2005 & 2009) versus robust decomposition data derived from human forensic taphonomic experimentation recently articulated by Schotsmans and colleagues (2022) - noting that we will highlight comparative data based on forensic experimental casework and actualistic modelling over inductive intuitive approaches which come with significant evidential shortcomings (Bristow et al. 2011).

      Finally, from a taphonomic perspective it is worth pointing out to reviewers that we have already addressed the issue of lack of taphonomic evidence for carnivore involvement in the formation of the Dinaledi assemblage (Dirks, et al., 2016). Absence of any carnivore-induced bone surface modifications, patterns of skeletal part representation, and a total absence of any carnivore remains found within the Dinaledi chamber (following Kuhn and colleagues, 2010) lead us to reject carnivores as possible vectors of body accumulation within the Dinaledi Chamber and Hill Antechamber.

      Reviewers suggest that without a date derived from geochronological methods, the engravings cannot be associated with H. naledi, and that it is possible (or probable) that the engravings were done in the recent past by H. sapiens. This suggestion neglects the context of the site. We have previously documented the structure and extremely limited accessibility of the Dinaledi subsystem. This subsystem was not recorded on maps of the documented Rising Star Cave system prior to our work and its discovery by our teams. Furthermore, there is no evidence of prehistoric human activity in the areas of the cave related to possible subterranean entrances There is no evidence that humans in the past typically ventured into such extreme spaces like those of Rising Star. It is clear from the presence of the remains of many individuals that H. naledi ventured into these spaces again and again. It is likely that H. naledi moved through these spaces more easily than humans do based on their physique. We show that the engravings overlay each other suggesting multiple engraving events.  These engravings took time and effort and the only evidence for use of the Dinaledi subsystem by any hominin is by H. naledi. The context leads to the null hypothesis that H. naledi made the marks. In our revision, we will elaborate on this argument to clarify the evidence for our stance on this hypothesis. Several reviewers took issue with the title of the engraving paper as we did not insert a qualifier in front of the suggested date range for the engravings. We deliberately left out qualifying language so that the title took the form of a testable hypothesis rather than a weak assertation. Should future work find the engravings were not produced within this time range, then we will restate this hypothesis.

      Finally, with regards to the engravings we have chosen to report them because they exist. Not reporting the presence of engraved marks on the walls of a cave above hypothesized burials would be tantamount to leaving relevant evidence out of the description of an archeological context. We recognize and state in our manuscript that these markings require substantial further study, including attempts at geochronological dating. But the current evidence is clearly relevant to the archaeological context of the subsystem. We take a similar stance with reporting the presence of the tool shaped artefact near the hand of the H. naledi skeleton in the Hill Antechamber. It is evident that this object requires further study, as we stated in our manuscript, but again omitting it from our study would be leaving out relevant evidence.

      Some have suggested that the null hypothesis should be that all of these observed circumstances are of natural origin. Our team took this approach in our early investigation of the Dinaledi subsystem (Dirks et al. 2015). We adopted the null hypothesis that the geological processes involved in the accumulation of H. naledi skeletal remains were “natural” (e.g., non-naledigenic involvement), and we were able to reject many alternative explanations for the assemblage, including carnivore accumulation, “death trap” accumulation, and fluvial transport of bodies or bones (Dirks et al. 2015). This led us to the hypothesis that H. naledi were involved in bringing the bodies into the spaces where they were found. But we did not hypothesize their involvement in the formation of the deposit itself beyond bringing the bodies to the location.

      This approach seems conservative. It followed the traditional view that small-brained hominins do not engage in cultural practices. But we recognize in hindsight that this null hypothesis approach did harm to our analyses. It impeded us from recognizing within our initial excavations of the puzzle box area and other excavations between 2014 – 2017 that we might be encountering remains that were intrusive in the sedimentary floor of the chamber. If we had approached the accumulation of a large number of hominins from the perspective of the null hypothesis being that the situation was likely cultural, we perhaps would have collected evidence in a slightly different manner. We certainly note that if the Dinaledi system had been full of the remains of modern humans, there would have been little doubt that the null hypothesis would have been that this was a cultural space and not a “natural space”.  We therefore respectfully disagree with the reviewers who continue to support the idea that we should approach hominin excavations with the null hypothesis that they will be natural (specifically non-cultural) in origins. If excavations continue with this mindset we believe that potential cultural evidence is almost certain to be lost.

      There has been a gradient across paleoanthropological excavations, archaeological work, and forensic investigation, with increasing precision of context. The reality is that the recording precision and frame of approach is typically different in most paleontological excavations than in those related to contemporary human remains. If anything comes from the present discussion of whether the Dinaledi system is a burial site for H. naledi or not, we hope that by taking seriously the possibility of deep cultural dynamics of hominins, we will encourage other teams to meet the highest standards of excavation in order to preserve potential cultural evidence. Given H. naledi’s cranial capacity we suggest that even very early hominin skeletal assemblages should be re-examined, if there is sufficient evidence or records available.  These would include examples such as the A.L. 333 Au. afarensis site (the so called First Family site in Hadar Ethiopia), the Dikika infant skeleton, WT 15000 (Turkana Boy) and even A.L. 288 (Lucy) as such unusual taphonomic situations where skeletons are preserved cannot be simply explained away as “natural” in origin, based solely on the cranial capacity and assumed lack of cognitive and cultural complexity of the hominins as emphasized by us in Fuentes et al. (2023). We are not the first to observe that some very early hominin situations may represent early mortuary activity (Pettitt 2013), but we would advocate a step further. We suggest it may be damaging to take “natural accumulation” as the standard null hypothesis for hominin paleoanthropology, and that it is more conservative in practice to engage remains with the null hypothesis of possible cultural formation.

      We are deeply grateful for the time and effort all of the 8 reviewers (across three reviews) have taken with this work.  We also acknowledge the anonymous reviewers from previous submissions who’s opinions and comments will have made the final iterations of these manuscripts better for their efforts. As this process is rather public and includes commentary outside of the eLife forum, we ask that the efforts of all 37 authors and 8 reviewers involved be respected and that the discourse remain professional in all venues as we study this fascinating and quite complex occurrence. We appreciate also the efforts of members of the public who have engaged with this relatively new process where preprints are posted prior to the reviews allowing comments and interactions from colleagues and the public who are normally not part of the internal peer review process.  We believe these interactions will make for better final papers. We feel we have met the standards of demonstrating burials in H. naledi and that the engraving are most likely associated with H. naledi. However, given the reviews we see many areas where our clarity and context, and analyses, were less strong than they can be. With the clarifications and additions taken on board through these review processes the final papers will be stronger and clearer. We, recognize that this is an ongoing process of scientific investigation and further work will allow continued, and possibly better, evaluation of these hypothesis and others.

      Lee R Berger, Agustín Fuentes, John Hawks, Tebogo Makhubela

      Works cited:

      • Aspöck, E. (2008). What Actually is a ‘Deviant Burial’?: Comparing German-Language and Anglophone Research on ‘Deviant Burials.’ In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books.  pp 17–34.

      • Bolliger, S.A. & Thali, M.J. (2009). Thanatology. In S.A. Bolliger and M.J. Thali (eds) Virtopsy Approach:  3D Optical and Radiological Scanning and Reconstruction in Forensic Medicine. Boca Raton: CRC Press. pp 187-218.

      • Boulestin, B. & Duday, H. (2005). Ethnologie et archéologie de la mort: de l’illusion des références à l’emploi d’un vocabulaire. In: C. Mordant and G. Depierre (eds) Les Pratiques Funéraires à l’Âge du Bronze en France. Actes de la table ronde de Sens-en-Bourgogne. Paris: Éditions du Comité des Travaux Historiques et Scientifiques. pp. 17–30.

      • Boulestin, B. & Duday, H. (2006). Ethnology and archaeology of death: from the illusion of references to the use of a terminology. Archaeologia Polona 44: 149–169.

      • Bristow, J., Simms, Z. & Randolph-Quinney, P.S. Taphonomy. In S. Black and E. Ferguson (eds.) Forensic Anthropology 2000-2010. Boca Raton, FL: CRC Press. pp 279-318.

      • Channing, J. & Randolph-Quinney, P.S. (2006). Death, decay and reconstruction: the archaeology of Ballykilmore Cemetery, County Westmeath. In J. O’Sullivan and M. Stanley (eds.) Settlement, Industry and Ritual: Archaeology. National Roads Authority Monograph Series No. 3. Dublin: NRA/Four Courts Press. pp 113-126.

      • Cherryson, A. K. (2008). Normal, Deviant and Atypical: Burial Variation in Late Saxon Wessex, c. AD 700–1100. In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books. pp 115–130.

      • Connolly, M., F. Coyne & L. G. Lynch (2005). Underworld : Death and Burial in Cloghermore Cave, Co. Kerry. Bray, Co. Wicklow: Wordwell.

      • Darwent, C. M. & R. L. Lyman (2002). Detecting  the postburial fragmentation of carpals, tarsals and phalanges. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press. pp 355-378.

      • d’Errico, F., & Backwell, L. (2016). Earliest evidence of personal ornaments associated with burial: The Conus shells from Border Cave. Journal of Human Evolution, 93, 91–108.

      • De Villiers. H. (1973). Human skeletal remains from Border Cave, Ingwavuma District, KwaZulu, South Africa. Annals of the Transvaal Museum, 28(13), 229–246.

      • Dell’Unto, N. and Landeschi, G. (2022). Archaeological 3D GIS. London: Routledge.

      • Dibble, H. L., Aldeias, V., Goldberg, P., McPherron, S. P., Sandgathe, D., & Steele, T. E. (2015). A critical look at evidence from La Chapelle-aux-Saints supporting an intentional Neandertal burial. Journal of Archaeological Science, 53, 649–657.

      • Dirkmaat, D. C., & Cabo, L. L. (2016). Forensic archaeology and forensic taphonomy: basic considerations on how to properly process and interpret the outdoor forensic scene_. Academic Forensic Pathology_ 6, 439–454.

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. ELife, 4, e09561.

      • Dirks, P.H.G.M., Berger, L.R., Hawks, J., Randolph-Quinney, P.S., Backwell, L.R., and Roberts, E.M. (2016). Comment on “Deliberate body disposal by hominins in the Dinaledi Chamber, Cradle of Humankind, South Africa?” [J. Hum. Evol. 96 (2016) 145-148]. Journal of Human Evolution 96:  149-153.

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. ELife, 6, e24231.

      • Donnelly, S., C. Donnelly & E. Murphy (1999). The forgotten dead: The cíllíní and disused burial grounds of Ballintoy, County Antrim. Ulster Journal of Archaeology 58, 109-113.

      • Duday, H. (2005). L’archéothanatologie ou l’archéologie de la mort. In: O. Dutour, J.-J. Hublin and B. Vandermeersch (eds) Objets et Méthodes en Paléoanthropologie. Paris: Comité des Travaux Historiques et Scientifiques. pp. 153–215.

      • Duday, H. (2009). Archaeology of the Dead: Lectures in Archaeothanatology. Oxford: Oxbow Books.

      • Finley, N. (2000). Outside of life: Traditions of infant burial in Ireland from cillin to cist.  World Archaeology 31, 407-422.

      • Gargett, R. H. (1999). Middle Palaeolithic burial is not a dead issue: The view from Qafzeh, Saint-Césaire, Kebara, Amud, and Dederiyeh. Journal of Human Evolution, 37(1), 27–90.

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015.

      • Gómez-Olivencia, A., & García-Martínez, D. (2019). New postcranial remains from the Roc de Marsal Neandertal child. PALEO. Revue d’archéologie Préhistorique, 30–1, 30–1.

      • Green, E.C. (2022). An archaeothanatological approach to the identification of late Anglo-Saxon burials in wooden containers. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 436-455.

      • Henderson, J. (1987). Factors determining the state of preservation of human remains. In A. Boddington, A. Garland and R. Janaway (eds). Death, Decay and Reconstruction: Approaches to Archaeology and Forensic Science. Manchester: Manchester University Press. pp 43-54.

      • Hunter, J. R. (2014). Human remains recovery: archaeological and forensic perspectives. In C. Smith (ed). Encyclopedia of Global Archaeology. New York: Springer New York. pp 3549-3556.

      • Hochrein, M. (2002). An Autopsy of the Grave: Recognizing, Collecting and Preserving Forensic Geotaphonomic Evidence. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press: 45-70.

      • Knüsel, C.K. & Robb, J. (2016). Funerary taphonomy: An overview of goals and methods. Journal of Archaeological Science: Reports 10, 655-673.

      • Kuhn, B.F., Berger, L.R. & Skinner, J.D. (2010). Examining criteria for identifying and differentiating fossil faunal assemblages accumulated by hyenas and hominins using extant hyenid accumulations. International Journal of Osteoarchaeology 20, 15-35.

      • Lyman, R. (1994). Vertebrate Taphonomy. Cambridge, Cambridge University Press.

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), 7857.

      • Mickleburgh, H.L & Wescott, D.J. (2018). Controlled experimental observations on joint disarticulation and bone displacement of a human body in an open pit: implications for funerary archaeology. Journal of Archaeological Science: Reports 20: 158-167.

      • Mickleburgh, H.L., Wescott, D.J., Gluschitz, S. & Klinkenberg, V.M. (2022). Exploring the use of actualistic forensic taphonomy in the study of (forensic) archaeological human burials: An actualistic experimental research programme at the Forensic Anthropology Center at Texas State University (FACTS), San Marcos, Texas. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 542-562.

      • Owsley, D. & B. Compton (1997). Preservation in late 19th Century iron coffin burials. In W. Haglund and M. Sorg (eds). Forensic Taphonomy: The Postmortem Fate of Human Remains. Boca Raton, FL, CRC Press: 511-526.

      • Parker Pearson, M. (1999). The Archaeology of Death and Burial. College Station: Texas A&M University Press.

      • Pettitt, P. (2013). The Palaeolithic Origins of Human Burial. Routledge.

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26.

      • Randolph-Quinney, P.S. (2013). From the cradle to the grave: the bioarchaeology of Clonfad 3 and Ballykilmore 6. In N. Brady, P. Stevens and J. Channing (eds.). Settlement and Community in the Fir Tulach Kingdom. Dublin: National Roads Authority Press. pp A2.1-48.

      • Randolph-Quinney, P.S., Haines, S. and Kruger, A. (2018). The use of three-dimensional scanning and surface capture methods in recording forensic taphonomic traces: issues of technology, visualisation, and validation. In: W.J. M. Groen and P. M. Barone (eds). Multidisciplinary Approaches to Forensic Archaeology. Berlin: Springer International Publishing, pp. 115-130.

      • Rendu, W., Beauval, C., Crevecoeur, I., Bayle, P., Balzeau, A., Bismuth, T., Bourguignon, L., Delfour, G., Faivre, J.-P., Lacrampe-Cuyaubère, F., Tavormina, C., Todisco, D., Turq, A., & Maureille, B. (2014). Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints. Proceedings of the National Academy of Sciences, 111(1), 81–86.

      • Sandgathe, D. M., Dibble, H. L., Goldberg, P., & McPherron, S. P. (2011). The Roc de Marsal Neandertal child: A reassessment of its status as a deliberate burial. Journal of Human Evolution, 61(3), 243–253.

      • Silver, M. (2016). Conservation Techniques in Cultural Heritage. In E. Stylianidis and F. Remondino (eds) 3D Recording, Documentation and Management of Cultural Heritage. Dunbeath: Whittles Publishing. pp 15-106.

      • Schotsmans, E.M.J., Georges-Zimmermann, P., Ueland, M. and Dent, B.B. (2022). From flesh to bone: Building bridges between taphonomy, archaeothanatology and forensic science for a better understanding of mortuary practices. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 501-541.

    1. Author Response:

      We would like to thank the eLife reviewers for the considerable time and effort they have invested to review these manuscripts. We have also benefited from a previous round of review of the manuscript describing the proposed burial features, which underwent two rounds of revisions in a high-impact journal over a period of approximately 8 months during 2022 and early 2023. Both sets of reviews have reflected mixed responses to the evidence we have presented, with one reviewer recommending acceptance with minor editorial revisions, two recommending acceptance with minor revisions and the fourth recommending rejection based upon similar arguments to those reflected by some of the reviewers in this current round of reviews in eLife. Ultimately the managing editor of this first journal took the decision that the review process could not be completed in a timely manner and rejected the manuscript although the submission here reflected our consideration of these reviewers suggestions.

      We have chosen in this initial response to the eLife reviews to include some references to the previous anonymous reviews in order to illustrate differences of opinion and differences in revision suggestions within the review process. Our goal is to offer maximal insight into our decision-making process and to acknowledge the considerable time and effort put into the assessment of these manuscripts by reviewers (for eLife and in the case of the earlier review process). We hope that this approach will assist the readers, and reviewers, of our manuscripts in understanding why we are proceeding with certain decisions during the revision process.

      This is a new process for us and the reviewers, and one way in which it significantly differs from more traditional review is that both the reviews and our reply will be public well in advance of our revisions to the manuscript. Indeed, considering the scope of the reviews, some of those revisions may take considerable time, although many can be accomplished fairly easily. Thus, we are not in a position to say that we have solved every issue raised by the reviewers. Instead, we will examine what appear to be the key critical issues raised regarding the data and the analyses and how we propose to address these as we revise the papers. We will also address several philosophical and ethical issues raised by the reviews and our proposal for dealing with these. More specific editorial and citational recommendations will be dealt with on a case-by-case basis, and we do not address these point-by-point in this reply. Please note, this response to the reviewers is not the revision of the manuscript and is only the initial opinion of the corresponding authors with some guidance from the larger group of authors of all three papers. Our final submitted revision will reflect the input of all authors included on those submissions.

      We took the decision to submit three separate papers consciously. The two different categories of evidence, burials and engravings, involve different kinds of analysis and different (although overlapping) teams of researchers, and we recognized that each deserved their own presentation and assessment. Meanwhile, together they inform the context of H. naledi in a way that requires some synthetic discussion, in which both kinds of evidence are relevant, leading to a third paper. But the mutual relevance of these different kinds of evidence and their review by a common set of reviewers naturally raises cross-cutting issues, and the reviewers have cross-referenced the three articles. This has sometimes led to suggestions about one manuscript based on the contents of another. Considering the situation, we accepted the recommendation that it would be clearer to consider all three articles in a single reply. Thus, while each of the three papers will proceed separately during the revision process, it will be necessary to highlight across all three papers occasionally in our responses.

      Scientific Issues:

      In reading the reviews, we feel there are 9 critical points/assertions raised by one or more of the reviewers that present a problem for, or challenge to, our hypothesis that the observed evidence (bone accumulations and engravings) described in the Dinaledi subsystem are of intentional naledigenic origin. These are:

      1. The evidence presented does not demonstrate a clear interruption of the floor sediments, thus failing to demonstrate excavated holes.

      2. The sediments infilling the holes where the skeletal remains are found have not been demonstrated to originate from the disruption of the floor sediments and thus could be part of a natural geological process (e.g. water movement, slumping) or carnivore accumulations.

      3. Previous geological interpretations by our research group have given alternative geological explanations for formation of the bony accumulations that contradict the present evidence presented here and result in alternative origins hypotheses.

      4. Burial cannot be effectively assessed without complete excavation of the features and site.

      5. The skeletal remains as presented do not conform clearly to typical body arrangement/positions associated with human (Homo sapiens) burials.

      6. There is no evidence of grave goods or lithic scatters that are typically associated with human burials.

      7. Humans may have been involved with the creation of either the Homo naledi bone accumulations, the engravings, or both.

      8. Without a date of the engravings, the null hypothesis should be the engravings were created by Homo sapiens.

      9. The null hypothesis for explanation of the skeletal remains in this situation should be “natural accumulation”.

      Our analysis of the Dinaledi Feature 1 leads us to accept that the laminated orange-red mudstone (LORM) sedimentary layer is interrupted, indicating a non-natural intervention, and that the hole created by the interruption was then filled by both a fleshed body (and perhaps parts of other bodies) which were then covered by sediment that originated from the hole that was dug. We recognize that the four eLife reviewers are not convinced that our presentation is sufficient to establish this. Interestingly, this was not the universal opinion of earlier reviewers of the initial manuscript several of whom felt we had adequately supported this hypothesis. The lack of clarity in this current version of the burial manuscript is our responsibility. In the upcoming revision of this paper to be submitted, we will take the reviewers’ critiques to heart and add additional figures that illustrate better the disruption of the LORM and clarify the sedimentological data showing the material covering the skeletal remains in the hole are the disrupted sediments excavated from the same hole. We are proposing to isolate this most critical evidence for burial into a separate section in the revised submission based on the reviewers’ comments. The fact that the LORM layer is disrupted, a fleshed body was placed in the hole created by this disruption, and the body (and perhaps parts of other bodies) was/were then covered by the same sediments from the hole is the central feature of our hypothesis that the bone accumulations observed reflect a burial and not a natural process.

      The possibility of fluvial transport or involvement in the subsystem is a topic that we have addressed extensively in past work, and it is clear from these reviews that we must enhance our current manuscript to discuss this issue at greater length. Our previous work (Dirks et al. 2015; Dirks et al. 2017) emphasized that fluvial transport of whole bodies into the subsystem was precluded by several lines of sedimentological evidence. We excavated a rich accumulation of skeletal remains, including articulated limbs and other elements in subvertical orientations inconsistent with slow sedimentary infill, which were difficult to explain without positing either a large and dense pile of bodies and/or sediment movement. We encountered fractured chunks of laminated orange-red mudstone (LORM) in random orientations within our excavation area, within and among skeletal remains, which directly refuted that the remains were inundated with water at the time of burial, and this limited the possibility of fluvial transport. Water flow sufficient to displace bodies or complete skeletal evidence would also transport large and course sediment, which is absent from the subsystem, and would sort the commingled skeletal material that we found by size, which we do not observe. But our excavation only covered less than a square meter at very limited depth, and this was the limit to our knowledge of subsurface sediment. We thus were left with uncertainty that led us to suggest the possibility of sediment slumping or movement into subsurface drains, although these were not observed near our excavation. Our current work expands our knowledge of the subsurface and presents an alternative explanation for the disposition of skeletal remains from our earlier excavation. But we acknowledge that this new explanation is vulnerable to our own previous published proposals, and we must do a better job of explaining how the new information addresses our previous suggestions. By not clearly creating a section where we explained how these previous hypotheses were now nullified by new evidence, we clearly confused the reviewers with our own previous work. We will revise the manuscript by enhancing the review of the significant geological evidence demonstrating that there is no significant fluvial action in the system and making it clear how the burial hypothesis provides a clearer explanation for the situation of skeletal remains from our previous excavation work.

      One of the central issues raised by reviewers has been a perceived need to excavate these features completely, totally exhuming all skeletal remains from them. Reviewers have written that it is necessary to identify every skeletal element that is present and account for any missing elements. On this point, we have both ethical and scientific differences from these reviewers. We express our ethical concerns first. Many of the best-preserved possible burials ever discovered by archaeologists were subjected to total excavation and exhumation. Cases like La Chapelle-aux-Saints, La Ferrassie, and Skhūl were fully excavated at a time when data recording and excavation methods did not include the range of spatial and geomorphological approaches that later became routine. The judgment of early investigators that these situations were intentional burials was challenged by later workers, and the kind of information that might enable better tests had been irrevocably lost (Gargett 1999; Dibble et al. 2015; Rendu et al. 2014).

      Later, improved excavation standards have not sufficed to remove uncertainty or debate about possible burials. For example, it was long presumed that well-preserved remains of young children were by themselves diagnostic of intentional burial, such as those from Dederiyeh, Border Cave, or Roc de Marsal. Such cases were also fully excavated, with adequate documentation of the positioning of skeletal remains and their surrounding stratigraphic situation, but such cases were later challenged on several bases and the complete exhumation of material has confused or precluded testing of new hypotheses (e.g. Gargett 1999). The case of Roc de Marsal is one in which data from the initial excavation combined with data from the initial excavation combined with re-excavation and geoarchaeological analysis led to a naturalistic interpretation of the skeletal material (Sandgathe et al. 2011; Goldberg et al. 2017). But even in this case, the researchers erred in their interpretation of the skeleton’s situation due to a lack of identification of parts of the infant’s skeleton (Gómez-Olivencia and García-Martinez 2019). That is to say, it is not only the burial hypothesis but other hypotheses that suffer from complete excavation. Researchers concerned with preserving all possible information have sometimes taken extraordinary measures to remove and study possible burials at high-resolution in the laboratory. Such was the case of the Shanidar IV burial removed from the site and transported in plaster jacket by Solecki, which led to the disruption and loss of internal stratigraphic information (Pomeroy et al. 2020). Arguably, the current state of the art is full excavation with partial preparation, such as that undertaken at Panga ya Saidi (Martinón-Torres et al. 2021). But again, any future attempt to reinterpret or test the hypothesis of burial must rely on the adequacy of documentation as the original context has been removed.

      In our decision to leave material in place as much as possible, we are expanding upon standard practice to leave witness sections and unexcavated areas for future research. The situation is novel, representing possible burials by a nonhuman species, and that makes it doubly important in our opinion to be conservative in not fully exhuming the skeletal material from its context. We anticipate that many other researchers, including future investigators, will suggest additional methods to further test the hypothesis of burial, something that would be impossible if we had excavated the features in their entirety prior to publishing a description of our work. We believe strongly that our ethical responsibility is to publish the work and the most likely interpretation while leaving as much evidence in place as possible to enable further testing and replication. We welcome the suggestions of additional methods/analyses to test the H. naledi burial hypothesis.

      This being said, we also observe that total exhumation would not resolve the concerns raised by the reviewers. The recommendation of total exhumation is in pursuit of a full account of all skeletal material present and its preservation and spatial situation, in order to demonstrate that they conform to body positions comparable to human burials. As has been highlighted in forensic casework, the excavation of an inhumation feature does not necessarily provide an accurate spatial or anatomical manifest of the stratigraphical relationships between the body, encapsulating matrix, and any cut present due to preservational, taphonomic and operational factors (Dirkmaat and Cabo, 2016; Hunter, 2014). In particular, in cases where skeletal elements are highly fragmented, friable, or degraded (such as through bioerosion) then complete excavation—even under controlled laboratory conditions—may destroy bone and severely limit skeletal identification (Henderson, 1997; Hochrein, 2002; Owsley and Compton, 1997), particularly in elements where the ratio of trabecular to cortical bone is high (Darwent and Lyman, 2002; Lyman, 1994). As such, non-invasive methods of 3D and 4D modelling (preservation in situ) are often considered preferable to complete necropsy or excavation (preservation by record) where appropriate (Bolliger and Thali, 2009; Dell’Unto and Landeschi, 2022; Randolph-Quinney et al., 2018; Silver, 2016). 

      The test of burial is not primarily positional, but taphonomic and geological. The position and number of bones can elaborate on process-driven questions of decay and destruction in the burial environment, or post-mortem modification, but are not singularly indicative of whether the remains were intentionally buried – the post-mortem narrative of all the processes affecting the cadaveric island is required (Knüsel and Robb, 2016). In previous cases, researchers have disputed or accepted the hypothesis of intentional hominin burial based upon assumptions about how modern humans or Neandertals would have positioned bodies, with the idea that some positions reflect ritual intent while others do not. But applying such assumptions is unjustifiable, particularly for a species like H. naledi, whose culture may have differed fundamentally from our own. Our work acknowledges that the present evidence does not enable a full reconstruction of the burial positions, but it does show that fleshed remains were encased in sediment prior to decomposition of soft tissue, and that subsequent spatial changes can be most parsimoniously explained by natural decomposition within sedimentary matrix contained within a burial feature (after Green, 2022; Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022). If the argument is that extraordinary claims require extraordinary evidence, we feel that the evidence documents excavation and interment (and will do so more clearly in the revision) and the fact of the remains do not match a “typical” human burial in body positioning is not in itself evidence that these are not H. naledi burials.

      We feel that the reviewers (in keeping with many palaeoanthropologists) have a clear idea of what they “think” a burial should look like in an idealised sense, but this platonic ideal of burial form is not matched by the extensive literature in archaeothanatology, funerary archaeology and forensic science which indicates enormous variability in the activity, morphology and post-mortem system experienced by the human body in cases of interment and body disposal (e.g. Aspöck, 2008; Boulestin and Duday, 2005 and 2006; Connelly et al., 2005; Channing and Randolph-Quinney, 2006; Cherryson, 2008; Donnelly et al., 1995; Finley, 2000; Hunter, 2014; Parker Pearson, 1999; Randolph-Quinney, 2013). Decades of experience in the identification, recovery and interpretation of clandestine, deviant, and non-formal burials indicates the platonic ideal is rare, and in many contexts, the exception (Cherryson, 2008; Parker Pearson, 1999). This variability is particularly relevant to morphological traits in burial context, such as the informal nature of the grave cut in plan and section, shallow burial depth, and initial disposition of body (placement) during the early post-mortem period. These might run counter to the expectations of reviewers or others referencing the fossil hominin record, but are well accepted within the communities of researchers investigating Holocene archaeological sites and forensic contexts.

      It is encouraging to see reviewers beginning to incorporate the extensive (often experimentally derived) literature from archaeothanatology and forensic taphonomy in their deliberations, and we will be taking these comments on board going forward. In particular, we acknowledge reviewers’ comments and the need to construct a more detailed post-mortem narrative, accounting for joint disarticulation (labile versus persistent joints etc), displacement, and final disposition of elements within the burial space. As such we will incorporate the hierarchy of decomposition (rank order disarticulation), associations between regions of anatomical association, areas of disassociation, and the voids produced during decomposition (after Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022) into our narrative. In doing so we acknowledge the tensions between the inductive archaeolothanatological narrative-driven approach (e.g. Duday, 2005 & 2009) versus robust decomposition data derived from human forensic taphonomic experimentation recently articulated by Schotsmans and colleagues (2022) - noting that we will highlight comparative data based on forensic experimental casework and actualistic modelling over inductive intuitive approaches which come with significant evidential shortcomings (Bristow et al. 2011).

      Finally, from a taphonomic perspective it is worth pointing out to reviewers that we have already addressed the issue of lack of taphonomic evidence for carnivore involvement in the formation of the Dinaledi assemblage (Dirks, et al., 2016). Absence of any carnivore-induced bone surface modifications, patterns of skeletal part representation, and a total absence of any carnivore remains found within the Dinaledi chamber (following Kuhn and colleagues, 2010) lead us to reject carnivores as possible vectors of body accumulation within the Dinaledi Chamber and Hill Antechamber.

      Reviewers suggest that without a date derived from geochronological methods, the engravings cannot be associated with H. naledi, and that it is possible (or probable) that the engravings were done in the recent past by H. sapiens. This suggestion neglects the context of the site. We have previously documented the structure and extremely limited accessibility of the Dinaledi subsystem. This subsystem was not recorded on maps of the documented Rising Star Cave system prior to our work and its discovery by our teams. Furthermore, there is no evidence of prehistoric human activity in the areas of the cave related to possible subterranean entrances There is no evidence that humans in the past typically ventured into such extreme spaces like those of Rising Star. It is clear from the presence of the remains of many individuals that H. naledi ventured into these spaces again and again. It is likely that H. naledi moved through these spaces more easily than humans do based on their physique. We show that the engravings overlay each other suggesting multiple engraving events.  These engravings took time and effort and the only evidence for use of the Dinaledi subsystem by any hominin is by H. naledi. The context leads to the null hypothesis that H. naledi made the marks. In our revision, we will elaborate on this argument to clarify the evidence for our stance on this hypothesis. Several reviewers took issue with the title of the engraving paper as we did not insert a qualifier in front of the suggested date range for the engravings. We deliberately left out qualifying language so that the title took the form of a testable hypothesis rather than a weak assertation. Should future work find the engravings were not produced within this time range, then we will restate this hypothesis.

      Finally, with regards to the engravings we have chosen to report them because they exist. Not reporting the presence of engraved marks on the walls of a cave above hypothesized burials would be tantamount to leaving relevant evidence out of the description of an archeological context. We recognize and state in our manuscript that these markings require substantial further study, including attempts at geochronological dating. But the current evidence is clearly relevant to the archaeological context of the subsystem. We take a similar stance with reporting the presence of the tool shaped artefact near the hand of the H. naledi skeleton in the Hill Antechamber. It is evident that this object requires further study, as we stated in our manuscript, but again omitting it from our study would be leaving out relevant evidence.

      Some have suggested that the null hypothesis should be that all of these observed circumstances are of natural origin. Our team took this approach in our early investigation of the Dinaledi subsystem (Dirks et al. 2015). We adopted the null hypothesis that the geological processes involved in the accumulation of H. naledi skeletal remains were “natural” (e.g., non-naledigenic involvement), and we were able to reject many alternative explanations for the assemblage, including carnivore accumulation, “death trap” accumulation, and fluvial transport of bodies or bones (Dirks et al. 2015). This led us to the hypothesis that H. naledi were involved in bringing the bodies into the spaces where they were found. But we did not hypothesize their involvement in the formation of the deposit itself beyond bringing the bodies to the location.

      This approach seems conservative. It followed the traditional view that small-brained hominins do not engage in cultural practices. But we recognize in hindsight that this null hypothesis approach did harm to our analyses. It impeded us from recognizing within our initial excavations of the puzzle box area and other excavations between 2014 – 2017 that we might be encountering remains that were intrusive in the sedimentary floor of the chamber. If we had approached the accumulation of a large number of hominins from the perspective of the null hypothesis being that the situation was likely cultural, we perhaps would have collected evidence in a slightly different manner. We certainly note that if the Dinaledi system had been full of the remains of modern humans, there would have been little doubt that the null hypothesis would have been that this was a cultural space and not a “natural space”.  We therefore respectfully disagree with the reviewers who continue to support the idea that we should approach hominin excavations with the null hypothesis that they will be natural (specifically non-cultural) in origins. If excavations continue with this mindset we believe that potential cultural evidence is almost certain to be lost.

      There has been a gradient across paleoanthropological excavations, archaeological work, and forensic investigation, with increasing precision of context. The reality is that the recording precision and frame of approach is typically different in most paleontological excavations than in those related to contemporary human remains. If anything comes from the present discussion of whether the Dinaledi system is a burial site for H. naledi or not, we hope that by taking seriously the possibility of deep cultural dynamics of hominins, we will encourage other teams to meet the highest standards of excavation in order to preserve potential cultural evidence. Given H. naledi’s cranial capacity we suggest that even very early hominin skeletal assemblages should be re-examined, if there is sufficient evidence or records available.  These would include examples such as the A.L. 333 Au. afarensis site (the so called First Family site in Hadar Ethiopia), the Dikika infant skeleton, WT 15000 (Turkana Boy) and even A.L. 288 (Lucy) as such unusual taphonomic situations where skeletons are preserved cannot be simply explained away as “natural” in origin, based solely on the cranial capacity and assumed lack of cognitive and cultural complexity of the hominins as emphasized by us in Fuentes et al. (2023). We are not the first to observe that some very early hominin situations may represent early mortuary activity (Pettitt 2013), but we would advocate a step further. We suggest it may be damaging to take “natural accumulation” as the standard null hypothesis for hominin paleoanthropology, and that it is more conservative in practice to engage remains with the null hypothesis of possible cultural formation.

      We are deeply grateful for the time and effort all of the 8 reviewers (across three reviews) have taken with this work.  We also acknowledge the anonymous reviewers from previous submissions who’s opinions and comments will have made the final iterations of these manuscripts better for their efforts. As this process is rather public and includes commentary outside of the eLife forum, we ask that the efforts of all 37 authors and 8 reviewers involved be respected and that the discourse remain professional in all venues as we study this fascinating and quite complex occurrence. We appreciate also the efforts of members of the public who have engaged with this relatively new process where preprints are posted prior to the reviews allowing comments and interactions from colleagues and the public who are normally not part of the internal peer review process.  We believe these interactions will make for better final papers. We feel we have met the standards of demonstrating burials in H. naledi and that the engraving are most likely associated with H. naledi. However, given the reviews we see many areas where our clarity and context, and analyses, were less strong than they can be. With the clarifications and additions taken on board through these review processes the final papers will be stronger and clearer. We, recognize that this is an ongoing process of scientific investigation and further work will allow continued, and possibly better, evaluation of these hypothesis and others.

      Lee R Berger, Agustín Fuentes, John Hawks, Tebogo Makhubela

      Works cited:

      • Aspöck, E. (2008). What Actually is a ‘Deviant Burial’?: Comparing German-Language and Anglophone Research on ‘Deviant Burials.’ In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books.  pp 17–34.

      • Bolliger, S.A. & Thali, M.J. (2009). Thanatology. In S.A. Bolliger and M.J. Thali (eds) Virtopsy Approach:  3D Optical and Radiological Scanning and Reconstruction in Forensic Medicine. Boca Raton: CRC Press. pp 187-218.

      • Boulestin, B. & Duday, H. (2005). Ethnologie et archéologie de la mort: de l’illusion des références à l’emploi d’un vocabulaire. In: C. Mordant and G. Depierre (eds) Les Pratiques Funéraires à l’Âge du Bronze en France. Actes de la table ronde de Sens-en-Bourgogne. Paris: Éditions du Comité des Travaux Historiques et Scientifiques. pp. 17–30.

      • Boulestin, B. & Duday, H. (2006). Ethnology and archaeology of death: from the illusion of references to the use of a terminology. Archaeologia Polona 44: 149–169.

      • Bristow, J., Simms, Z. & Randolph-Quinney, P.S. Taphonomy. In S. Black and E. Ferguson (eds.) Forensic Anthropology 2000-2010. Boca Raton, FL: CRC Press. pp 279-318.

      • Channing, J. & Randolph-Quinney, P.S. (2006). Death, decay and reconstruction: the archaeology of Ballykilmore Cemetery, County Westmeath. In J. O’Sullivan and M. Stanley (eds.) Settlement, Industry and Ritual: Archaeology. National Roads Authority Monograph Series No. 3. Dublin: NRA/Four Courts Press. pp 113-126.

      • Cherryson, A. K. (2008). Normal, Deviant and Atypical: Burial Variation in Late Saxon Wessex, c. AD 700–1100. In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books. pp 115–130.

      • Connolly, M., F. Coyne & L. G. Lynch (2005). Underworld : Death and Burial in Cloghermore Cave, Co. Kerry. Bray, Co. Wicklow: Wordwell.

      • Darwent, C. M. & R. L. Lyman (2002). Detecting  the postburial fragmentation of carpals, tarsals and phalanges. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press. pp 355-378.

      • d’Errico, F., & Backwell, L. (2016). Earliest evidence of personal ornaments associated with burial: The Conus shells from Border Cave. Journal of Human Evolution, 93, 91–108.

      • De Villiers. H. (1973). Human skeletal remains from Border Cave, Ingwavuma District, KwaZulu, South Africa. Annals of the Transvaal Museum, 28(13), 229–246.

      • Dell’Unto, N. and Landeschi, G. (2022). Archaeological 3D GIS. London: Routledge.

      • Dibble, H. L., Aldeias, V., Goldberg, P., McPherron, S. P., Sandgathe, D., & Steele, T. E. (2015). A critical look at evidence from La Chapelle-aux-Saints supporting an intentional Neandertal burial. Journal of Archaeological Science, 53, 649–657.

      • Dirkmaat, D. C., & Cabo, L. L. (2016). Forensic archaeology and forensic taphonomy: basic considerations on how to properly process and interpret the outdoor forensic scene_. Academic Forensic Pathology_ 6, 439–454.

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. ELife, 4, e09561.

      • Dirks, P.H.G.M., Berger, L.R., Hawks, J., Randolph-Quinney, P.S., Backwell, L.R., and Roberts, E.M. (2016). Comment on “Deliberate body disposal by hominins in the Dinaledi Chamber, Cradle of Humankind, South Africa?” [J. Hum. Evol. 96 (2016) 145-148]. Journal of Human Evolution 96:  149-153.

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. ELife, 6, e24231.

      • Donnelly, S., C. Donnelly & E. Murphy (1999). The forgotten dead: The cíllíní and disused burial grounds of Ballintoy, County Antrim. Ulster Journal of Archaeology 58, 109-113.

      • Duday, H. (2005). L’archéothanatologie ou l’archéologie de la mort. In: O. Dutour, J.-J. Hublin and B. Vandermeersch (eds) Objets et Méthodes en Paléoanthropologie. Paris: Comité des Travaux Historiques et Scientifiques. pp. 153–215.

      • Duday, H. (2009). Archaeology of the Dead: Lectures in Archaeothanatology. Oxford: Oxbow Books.

      • Finley, N. (2000). Outside of life: Traditions of infant burial in Ireland from cillin to cist.  World Archaeology 31, 407-422.

      • Gargett, R. H. (1999). Middle Palaeolithic burial is not a dead issue: The view from Qafzeh, Saint-Césaire, Kebara, Amud, and Dederiyeh. Journal of Human Evolution, 37(1), 27–90.

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015.

      • Gómez-Olivencia, A., & García-Martínez, D. (2019). New postcranial remains from the Roc de Marsal Neandertal child. PALEO. Revue d’archéologie Préhistorique, 30–1, 30–1.

      • Green, E.C. (2022). An archaeothanatological approach to the identification of late Anglo-Saxon burials in wooden containers. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 436-455.

      • Henderson, J. (1987). Factors determining the state of preservation of human remains. In A. Boddington, A. Garland and R. Janaway (eds). Death, Decay and Reconstruction: Approaches to Archaeology and Forensic Science. Manchester: Manchester University Press. pp 43-54.

      • Hunter, J. R. (2014). Human remains recovery: archaeological and forensic perspectives. In C. Smith (ed). Encyclopedia of Global Archaeology. New York: Springer New York. pp 3549-3556.

      • Hochrein, M. (2002). An Autopsy of the Grave: Recognizing, Collecting and Preserving Forensic Geotaphonomic Evidence. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press: 45-70.

      • Knüsel, C.K. & Robb, J. (2016). Funerary taphonomy: An overview of goals and methods. Journal of Archaeological Science: Reports 10, 655-673.

      • Kuhn, B.F., Berger, L.R. & Skinner, J.D. (2010). Examining criteria for identifying and differentiating fossil faunal assemblages accumulated by hyenas and hominins using extant hyenid accumulations. International Journal of Osteoarchaeology 20, 15-35.

      • Lyman, R. (1994). Vertebrate Taphonomy. Cambridge, Cambridge University Press.

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), 7857.

      • Mickleburgh, H.L & Wescott, D.J. (2018). Controlled experimental observations on joint disarticulation and bone displacement of a human body in an open pit: implications for funerary archaeology. Journal of Archaeological Science: Reports 20: 158-167.

      • Mickleburgh, H.L., Wescott, D.J., Gluschitz, S. & Klinkenberg, V.M. (2022). Exploring the use of actualistic forensic taphonomy in the study of (forensic) archaeological human burials: An actualistic experimental research programme at the Forensic Anthropology Center at Texas State University (FACTS), San Marcos, Texas. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 542-562.

      • Owsley, D. & B. Compton (1997). Preservation in late 19th Century iron coffin burials. In W. Haglund and M. Sorg (eds). Forensic Taphonomy: The Postmortem Fate of Human Remains. Boca Raton, FL, CRC Press: 511-526.

      • Parker Pearson, M. (1999). The Archaeology of Death and Burial. College Station: Texas A&M University Press.

      • Pettitt, P. (2013). The Palaeolithic Origins of Human Burial. Routledge.

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26.

      • Randolph-Quinney, P.S. (2013). From the cradle to the grave: the bioarchaeology of Clonfad 3 and Ballykilmore 6. In N. Brady, P. Stevens and J. Channing (eds.). Settlement and Community in the Fir Tulach Kingdom. Dublin: National Roads Authority Press. pp A2.1-48.

      • Randolph-Quinney, P.S., Haines, S. and Kruger, A. (2018). The use of three-dimensional scanning and surface capture methods in recording forensic taphonomic traces: issues of technology, visualisation, and validation. In: W.J. M. Groen and P. M. Barone (eds). Multidisciplinary Approaches to Forensic Archaeology. Berlin: Springer International Publishing, pp. 115-130.

      • Rendu, W., Beauval, C., Crevecoeur, I., Bayle, P., Balzeau, A., Bismuth, T., Bourguignon, L., Delfour, G., Faivre, J.-P., Lacrampe-Cuyaubère, F., Tavormina, C., Todisco, D., Turq, A., & Maureille, B. (2014). Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints. Proceedings of the National Academy of Sciences, 111(1), 81–86.

      • Sandgathe, D. M., Dibble, H. L., Goldberg, P., & McPherron, S. P. (2011). The Roc de Marsal Neandertal child: A reassessment of its status as a deliberate burial. Journal of Human Evolution, 61(3), 243–253.

      • Silver, M. (2016). Conservation Techniques in Cultural Heritage. In E. Stylianidis and F. Remondino (eds) 3D Recording, Documentation and Management of Cultural Heritage. Dunbeath: Whittles Publishing. pp 15-106.

      • Schotsmans, E.M.J., Georges-Zimmermann, P., Ueland, M. and Dent, B.B. (2022). From flesh to bone: Building bridges between taphonomy, archaeothanatology and forensic science for a better understanding of mortuary practices. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 501-541.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-Point Response (author’s replies in plain text)


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Silao et al make the intriguing observation that yeasts that are generally considered less pathogenic are unable to catabolize proline than Candida albicans. They then, in Candida albicans, construct mutants defective for the two key enzymes (Put1, Put2) required to convert proline to glutamate, which they show to be essential for proline utilization as an energy (carbon) and nitrogen source. The authors proceed to untangle the regulatory aspects of proline degradation, including the respective cellular localization of its key enzymes. They then make the important discovery that strains lacking either Put1 or Put2 suffer from a proline-dependent growth defect, which they attribute to resulting defects in mitochondrial metabolism.

      The manuscript then goes on to analyze a broad range of infection models including: reconstituted human epithelial skin model, Drosophila, mouse systemic infections, organ colonization in these mice (kidney, spleen, brain, liver and histochemistry of the kidneys) as well as survival when incubated with cultured human neutrophils. Finally, they use yeast cells constitutively expressing yEmRFP (so that yeasts can be distinguished from other host cells) and coated with FITC before incubation with the host cells (which coats the wall of the original cells, but does not spread to progeny) and they go on to perform an impressive set of analyses of C. albicans growth within mouse kidneys both in vivo and ex vivo, exploiting an implanted window together with intravital imaging with a two photon microscope at different time points. The system is impressive and visualizes tissue invasion by hyphal cells beautifully. Finally, they compare the intra vital images from WT and put2-/- cells and show that, as in vitro, put2-/- cells do not form filaments and do not show extensive invasion of the kidney tissue. While the in vivo aspect of the study includes many different models, it finds defects in virulence for different subsets of put mutants and the relative importance of filamentation vs proline utilization for virulence is not conclusively resolved.

      Overall, this is an important and timely manuscript, which significantly contributes to the understanding of how proline metabolism intersects with yeast fitness in the context of infections. However, there are several major concerns regarding some of the conclusions drawn from the study. In addition, some general recommendations that would improve the manuscript are provided.

      Specifically, the manuscript provides a very detailed description of experiments and observations. However, in several parts it is difficult to follow and the reader needs more guidance about the logic involved in reaching conclusion. Specifically, several aspects of the paper are written for experts in Candida (yeast) metabolism. Here, explaining the rationale for some of the experiments, and providing more background information that is not obvious to a non-expert, is required.

      In particular, writing a clear and measured summary sentence at the end of each paragraph and a conclusion paragraph that summarizes key findings in simple terms would help make the manuscript more digestible for readers.

      In addition, the impressive microscopy and broad range of in vivo experiments is comprehensive but only adds incremental information relevant to proline metabolism-that filamentous growth in vivo and virulence is reduced in cells carrying some mutations in one or more put genes. However, this broad sweep of model systems and the development of the in vivo imagining system might have more impact in a separate paper focused on the real-time in vivo visualization of kidney invasion.

      We thank Reviewer 1 for the extensive list of comments and have endeavored to adjust the manuscript to address all of the major and minor concerns. It is evident that Reviewer 1 clearly understood the significance of the work and we appreciate that the comments are presented in a positive manner intended to improve our manuscript.

      Major comments:

      1. The main finding that impressed this reviewer is that "removing the ability to catabolize proline, in an organism that evolved to catabolize it, leads to (growth) defects". This point could be better highlighted throughout the manuscript.

      Thanks for the comment. We will adjust the text to reflect this suggestion.

      1. The authors show that deletion strains for proline metabolism have defects that are important for in vivo pathogenicity. This is an important finding. However, as the manuscript reads now, it suggests that the main findings are that the ability to use proline in the respective host niche is key. Mechanistically, the manuscript revolves primarily around defects that arise when deleting PUT1 and/or PUT2 (i.e., an "unknown" toxicity of proline in the case of put1-/- (or put1-/- put2-/-) and the additional P5C-dependent toxicity for put2-/- mutants; see below).

      Yes, the reviewer is correct in that we believe that proline catabolism is necessary to initiate and power hyphal growth, which is coupled to virulence. We have previously shown that upon phagocytosis by macrophages, the expression of Put1, Put2 and even Gdh2 are induced in phagocytized C. albicans cells, which is consistent with the analysis shown in Fig. 2D and Fig. S2B. Consequently, proline, or an amino acid that is metabolized via the proline catabolic pathway, must be present in the phagosomal compartment. However, as we now report, proline inhibits growth of cells lacking the capacity to catabolize it. Although we cannot differentiate the cause of reduced virulence in put mutants, i.e., the lack of energy due to the inability to catabolize proline vs proline toxicity, proline catabolism is clearly important and a robust indicator of virulence. As point 1, we have adjusted the text to make this clearer.

      1. In order to claim that catabolizing prolines promotes pathogenicity (as opposed to the alternative hypothesis that the inability to catabolize proline leads to the observed defects), additional experiments would be required. For example, the put mutants would need to be compared with mutants that significantly reduce/impair proline uptake, such as the referenced gnp2 mutant (Garbe et al 2022). While the finding that less pathogenic yeast species are unable to catabolize proline is both intriguing and important, it also remains as is presented as a loose, non-quantitative correlation that only tangentially address the question of whether "proline catabolism is key for pathogenicity".

      We have in fact already shown that proline uptake is required to induce filamentation (Martínez and Ljungdahl 2003, Fig. 6). The main point of our current work, which we believe is important and of general interest, is that C. albicans is adapted to use proline as sole energy source, which reflects the environment (humans) in which it evolved. See the response to point 2. Interestingly, the differences in the expression levels of Put1 (off in the absence of proline, induced robustly by proline) and Put2 (low level of constitutive expression, induced robustly by proline) suggest that cells are primed to decrease the likelihood of becoming inhibited by P5C, i.e., the constitutive expression of Put2 is able to ameliorate the potential toxicity of P5C. Regardless, the finding that put1 and put2 mutants exhibit significantly reduced virulence in two host models provides clear support for proline catabolism being key for C. albicans pathogenicity.

      1. 238 onwards: The conclusion that "the primary growth inhibitory effect of proline is linked to catabolic intermediates formed by Put1 and that are metabolized further by Put2"does not appear to be fully supported by the evidence. Addition of proline to put1 mutants already reduced OD600 by ~50% (Figure 2); and is further reduced to ~10% when put2 is deleted. This implies that there are two inhibitory effects of proline, not one primary one. At the least, this option should be discussed, including why deletion of PUT1 leads to proline toxicity. The latter is not clear-is it that too much proline accumulates in the cell and this accumulation is toxic? If this is the case, the effect would be expected to be proline concentration dependent. Performing a relatively simple experiment as performed for the put2 mutant (Fig. 3 / S3F) may clarify this issue. Particularly, if the experiment would be coupled with intracellular quantification of proline.

      Precisely! Proline toxicity is evident even in put1 mutants, clearly suggesting that proline, without being further catabolized, exerts a growth inhibitory effect (Fig. 3A). We traced this inhibitory effect to decreased mitochondrial respiration (Fig. 3E). There are two parameters to consider regarding the inhibitory effects of proline in put2 mutants. First, the presence of proline induces the expression of Put1 independent of Put2 (Fig. S2C), consequently, the levels of the toxic intermediate P5C increases (Fig. 3B). P5C has previously been postulated to inhibit mitochondrial respiration, which is well-aligned with our analysis (Fig. 3E; see response Point 5). We initially tested whether a proline-P5C cycle, suggested by work in mammalian cells, would play a role in proline-mediated toxicity; however, increasing cytoplasmic pools of proline by supplying high levels of glutamate (which according to work in mammalian cells should efficiently convert to cytoplasmic proline) did not occur; we did not see glutamate-enhanced Put1 expression (Fig. 2D, S2A, S2B). We agree with the reviewer with respect to the suggested experiment, and have monitored growth of put1 in media with different proline concentrations. The results are incorporated in the revised Fig. 3.

      1. The caption "P5C mediates a respiratory block" is misleading, as the evidence is not that compelling: Although P5C increases in put2, but not in put1 mutants, and given that both single mutants experience a proline-dependent respiratory defect (Fig. 3E), the results suggest a more complex relationship.

      Previous work using pure P5C (Ref. 36; Nishimura et al) showed that it targets respiration, hence the caption “respiratory block” in the header. In mammals, PRODH (Put1) physically interacts with mitochondrial respiratory complex II in the inner mitochondrial membrane (line 89-90), while P5CDH (Put2) is in the matrix. The put1 mutation might affect basal activity of the respiratory chain resulting in lowered respiration, which may compound when proline accumulates in the mitochondria. The inhibitory mechanism remains unknown, and in going forward we have begun characterizing various GFP-tagged respiratory complex components in put1 mutants and in strains co-expressing Put1-RFP (for interaction studies). The results are out of the scope of this current work.

      1. The virulence assays and in vivo experiments do not present a unifying view: in Drosophila put2∆∆ is less virulent than put1∆∆, which appears similar to put3∆∆. Given that put2 mutants grow slowly, likely because of P5C inhibition, this seems logical. However, in mice, put3∆∆ remains highly virulent while put1∆∆ and put2∆∆ results for survival are mixed. Furthermore, in 4 mouse organs, put1∆∆ and put2∆∆ are not significantly different from one another but are different from wt, while put3∆∆ has no significant reduction in CFU. Kidney histology shows very little invasion by put1 and put2 and more by put3, but visually put3 appears to invade much less than the WT, and the human neutrophil experiment shows effects of put2 or put3 but not put1. This leaves the reader rather confused. It may be worth discussing the reasons for different results in different models. Is the availability of proline in each of the organisms and organs similar?

      We thank the reviewer for these thoughtful observations, however, we note that all of the diverse assay systems employed provide a clear and consistent indication that the inability to completely catabolize proline significantly reduces virulence. This is well-aligned with our previous data regarding the need for proline catabolism to escape macrophages (Silao et al, 2019). The requirement for Put3 may not be very strict since the Put enzymes are still expressed in the absence of Put3 (Fig. 2D/S2A/S2B), indicating the activity of additional regulatory factors; hence, this may explain why the put3 strain behaves like wildtype in the murine model (Fig. 5B). The dispensability of Put3 in the murine model could be due to a lower neutrophil count and that murine neutrophils exhibit a lower affinity for fungal cells as compared to human blood (Machata et al., 2020, Front Immunol). The more pronounced requirement of Put3 to survive in whole human blood and when co-cultured with human neutrophils could indeed be linked to the need to rapidly derepress PUT1/PUT2 (and even other target genes) as suggested by the global RNASeq analysis that shows that proline catabolism is a core response of C. albicans during neutrophil interaction (Niemiec MJ et al., 2017, BMC Genomics). In Drosophila, a well-established model to study innate immunity, the presence of hemocytes that fulfill the equivalent functions of neutrophils and macrophages could explain the increased requirement for Put3. In summary, although it is impossible to know the precise mechanistic basis underlying the observed differences, we believe it unreasonable to expect that all mutations behave identically in each virulence model. In fact, differences considered trivial such as the use of mouse background can have profound effects on virulence. Presumably the differences we report are due to the specific nutrient composition (proline and metabolites feeding into the proline catabolic network) and physical parameters intrinsic to each model. For instance, Lionakis et al. (2013) suggested that filamentation occurs faster in the kidney compared to other organs, such as the liver/spleen, indicating the presence of kidney-specific cues that drive infections of this organ.

      1. The ex vivo and in vivo analysis of the dynamics of C. albicans growth in the host is visually impressive, but it distracts from the focus of the paper and the metabolic findings. Showing that put mutant cells do not form filaments in vivo (as in vitro) does not add much conceptually to the paper. Furthermore, this lovely advance in in vivo visualization is lost at the end of this paper and the authors should consider whether it might fit better in manuscript that could really highlight the in vivo visualization approach.

      We appreciate this comment. Indeed, our lab is at an advanced stage of completing a manuscript focused on the use of intravital and clearing microscopy to follow the onset of an upper urinary tract infection (UTI) in a murine candidemia model. However, our ability to visualize in 3D the onset of an infection in a living host is not a trivial achievement and we were impressed that it provided a clear answer as to whether a single C. albicans cell can initiate an infection and undergo morphogenesis leading to hyphal growth. Furthermore, we tested a put2 strain, the growth of which is highly sensitive to the presence of proline, and found that it did not exhibit filamentous growth. This clearly shows that cells colonizing the kidney are exposed to an environment that requires a functional proline catabolic network to exhibit filamentous growth, a characteristic of renal infections. Our results are consistent with the kidney being a metabolic hub for arginine/proline biosynthesis, which likely increases the levels of these amino acids in this organ.

      1. The discussion of cells stained with FITC and expressing yEmRFP does not clearly point out that the FITC is only an indicator for those cells that were used to innoculate the tissue and that finding cells without FITC indicates that they are mitotic progeny, indicating that they have been dividing. The authors clearly understand this, but a naive reader may miss this important point if it is not stated explicitly.

      We have adjusted the text to explicitly clarify this.

      Minor comments:

      1. Throughout: what is the distinction between utilization of proline for C or for energy? These terms seem to be used interchangeably.

      C. albicans is heterotroph that can use proline to generate biomass (gluconeogenesis, etc) and its catabolism generates sufficient amounts of ATP to power growth. Thus, when proline is used as sole carbon source, it can also serves as the sole energy source. In the text, we have tried to be consistent using “carbon source” when discussing proline as a component of growth media, and “energy source” when discussing proline catabolism.

      1. Introducing the schematic in Fig. 2A at the beginning of Figure 1, would help explain proline catabolism before delving into the growth experiments that rely upon this framework. This should include an explanation, for readers less familiar with the metabolic issues, of the main limitations to catabolizing proline, and the key issues for being able to use proline for nitrogen, carbon, and energy (potentially indicated in the overview figure, e.g. pointing towards gluconeogenesis etc.).

      We have considered the reviewers suggestion, however, we believe that the placement of the schematic in Fig 2 is appropriate as is, and where it will hopefully enable readers to more readily grasp the strain construction and experiments documented in Fig.2.

      1. Saccharomyces can only grow on proline as a nitrogen source, but not as energy/carbon source. Could the authors briefly mention or discuss why this is the case? This is not clearly apparent after reading the manuscript and it leaves the reader confused and trying to understand if the fact that proline is required for carbon utilization is a new finding of this paper or was already known. Do the authors think this is tied to the presence of complex 1 components in C. albicans that are not found in S. cerevisiae. Is this consistent for the pathogenic, but not the non-pathogenic yeasts analyzed in figure 1?

      We have adjusted the text to clarify our thoughts regarding this. Indeed, we do believe that a major reason for the ability of C. albicans to efficiently grow using proline as a sole energy source is the presence of Complex I. However, C. glabrata appears to be able to grow well using proline as sole energy source despite apparently lacking Complex I. Consequently, alternative NADH dehydrogenases exist in C. glabrata, but how this is coupled to energy metabolism will require additional work that is out of the scope of the present work.

      1. 100: While Gdh2 is apparently an important enzyme for generating ammonium, why is it not necessary for macrophage escape and virulence as shown in reference 18? A recent paper from Garbe et al (ref 12) suggests that Gnp2 is the major proline permease in C. albicans and what is known, and not known, about proline uptake would be good to mention, given that PUT gene functions require that proline enters the cells.

      We have recently shown that ammonia generation by Gdh2 is dispensable for macrophage escape and documented that phagosome alkalinization is not a requisite for the induction of hyphal growth (Silao et al. 2020). We have referred to the work of Garbe et al., which is consistent with our previous work (Martinéz and Ljungdahl, 2004) where we reported that proline-dependent filamentation is dependent on Csh3. Csh3 is an ER membrane-localized chaperone responsible for catalyzing the proper folding of amino acid permeases, in csh3 null mutant strains, amino acid permeases accumulate in the ER as non-functional unfolded aggregates. Consistently, we have tested and found that proline-induced Put2-GFP expression is dependent on Csh3 (unpublished), clearly establishing that the regulatory effects of proline are dependent on its uptake. We have not generated a gnp2-/- strain, but suspect that we could find growth conditions where such a mutant would be refractory to proline induction. We have adjusted the text to include this information.

      1. 116: Is the "low sugar environment of the host" referring to a specific niche, such as the GI tract, or human blood? Compared to most natural environments, glucose is abundant in the host, e.g., at ~5 mM, it is the most abundant metabolite in blood, and similarly, in the GI tract, levels can go beyond 50 mM glucose (see e.g. PMIDs 34371983, 21359215). Or is this comment indicating that the in vivo sugar concentration is lower than that in common lab growth media? Please spell out the niche/concentration for clarification - and compare that to other niches that are considered "high sugar environments".

      We have adjusted the text to clarify our statement. The natural environment of C. albicans is the human host. Virulent infections are not within the GI with high sugar content, but rather result when C. albicans cells successfully cross into the blood with a relatively low glucose (5 mM), which importantly is a level that does not effectively repress mitochondrial function. A major point of our recent work is that laboratory experiments with C. albicans growing on YPD or SD with 2% glucose (111 mM) examine growth of cells with repressed mitochondrial functions.

      1. 123: "proline as sole energy source" - suggest "is the source of carbon, nitrogen, and energy"

      The text is adjusted (see response to Minor Point 1).

      1. 142: it is worth noting to readers that C. neoformans is a basidiomycete and thus VERY distant from the other yeasts studied here-it is in a different major phylum of fungi.

      Again, thanks for this suggestion, the text is adjusted. We included C. neoformans since the role of proline catabolism has been characterized and linked to its pathogenicity (reviewed in Christgen and Becker, 2018, Antioxi Redox Signal, Ref. 1).

      1. 143: Here it is implied that put1 and put2 mutant strains do not grow on SPD, but this is not stated explicitly.

      The put1 and put2 mutants are unable to grow in/on all media containing proline as sole nitrogen source. The phenotype is very tight that we were able to exploit this as a selection phenotype for reconstitution (Fig. 1A). We have adjusted the text to make this clear.

      1. 151: The abbreviation SPG is not explained in main text. This was explained in the methods (1% glycerol as primary carbon source).

      As suggested, we have defined SPG in the main text.

      1. Paragraph 156 onwards: this section is particularly hard to read and very dense. Also, it is difficult to understand the significance of these experiments for the overall findings of the paper. Please at least provide a small conclusion / summary at the end of the paragraph that puts the findings into perspective.

      We have adjusted text to make it more accessible.

      1. Figure 2 C: simplifying the scheme (e.g. lots of redundant information, P2 and Mito - just give it one name) would help. This figure may be better in the supplementary material.

      The schematic of our subcellular fractionation study uses standard designations routinely used by the cell biology community. We believe that its inclusion will help readers judge the how we mapped the intracellular localization of the reporter proteins, which is essential to understand the proline catabolic network.

      1. Figure 2B: It is not directly apparent from the micrographs that Put1-RFP localisation is mitochondrial. Co-localisation of the RFP with a mitochondrial dye (e.g., mitotracker) or something similar is required to validate it.

      We have previously reported that Put2 is a bona fide mitochondrial protein (by confocal microscopy, subcellular fraction, and co-localization with Mitotracker (Far Red) (Silao et al., Ref 17). The fact that the Put1-RFP associated fluorescence exhibits a distinct mitochondrial signature, is spatially exclusive and exhibits no overlap with the cytosolic pattern of Gdh2-GFP, co-fractionates with Put2-HA and the mitochondrial marker Atp1, should suffice to confirm that Put1-RFP is a mitochondrial localized protein.

      1. Throughout the manuscript (figure legends): Suggest using "mean" instead of "Ave."

      We have adjusted the legends.

      1. 175: According to the 'Yeasttract' and 'Pathoyeasttract' databases, Put1 regulates at least 36 and 22 genes, in S. cerev. and C. alb., respectively (based on DNA binding and/or regulatory changes). The only gene in common between these two lists of genes is PUT1. Thus, it is quite likely that Put3 regulates many other processes that explain its function and that its major function may not be only to regulate Put1.

      We assume that the reviewer is referring to Put3 (instead of Put1). Yes, Tebung et al. (2017) suggested that Put3 also regulates other genes. However, their data show that C. albicans put3 mutant was unable to grow in medium (YCB+Pro) compared to SPD (2% glucose as carbon source) where proline is used merely as a nitrogen source (Tebung et al., Fig. 3A). Our data in Fig. 1C shows that a put3 null strain exhibits residual growth on SPD, which aligns well with the expressed levels of PUT enzymes (Fig. 2D). Our conclusion is that despite being essential for rapid proline-dependent derepression of proline catabolic genes, Put3 is not the only transcription factor operating at the promoters of the PUT genes.

      1. 175: Is it clear whether the Put3-independent mechanisms are positive or negative with respect to Put1?

      We have accumulated evidence that an additional transcription factor positively regulates PUT1 expression and have a manuscript in preparation to describe this factors. The manuscript will focus on the Put3-independent regulation of PUT1, PUT2, and GDH2 expression.

      1. 218: Suggestion: "growth was indistinguishable".Unless growth curves or growth rates are provided and if one time-point data are the basis for this point, than "rates" is not a relevant term.

      The reviewer is correct; we will adjust the text accordingly. We have performed growth assays in a multi-well microplate format (Bioscreen) and found that the growth rates are not statistically different between WT, put1, put2, and put1 put2 strains in the presence and absence of proline in SD with 2% glucose. This is consistent with glucose repression of mitochondrial function, i.e., proline toxicity depends on derepression of mitochondrial function.

      1. 256 onwards: did the authors test if the ROS scavenging effectively reduced ROS? i.e. does the luminol-HRP assay yield less ROS in +proline +scavenger treatment? This is necessary to effectively conclude that the growth inhibitory effect of proline is due to blocking respiration.

      Indeed, we used NAC as a control in the luminol-HRP system and we saw reduction in ROS formation. In fact, this is the underlying reason why we used high levels of NAC for growth rescue (in Fig. 3D). We include the control data as Fig S3F.

      1. The Figure captions are extremely lengthy and detailed, making it cumbersome to find the relevant information. Suggest moving some of the information, such as additional experimental details, into the methods section.

      We have streamlined the figure legends.

      1. 277-301: Phloxine is not exclusively a live/dead cell indicator-it is an indicator of metabolic activity. In Scerev. and Calb. it also indicates slower growth, opaque growth, and it has been used as an indicator of aneuploidy in C. glabrata (https://journals.asm.org/doi/10.1128/msphere.00260-22) and of diploids vs haploids in S. pombe. The colonies illustrated aer made up of many live cells, and thus the section "Defective proline utilization is linked to cell death" needs to be presented more carefully. In addition, it appears that this section shifts from using defined medium to using rich medium and 37C instead of 30C. Why was this shift necessary?

      The reviewer is correct that phloxine (PXB) has been used to identify opaque growth (EFG1-dependent). However, the fact that the accumulation of PXB in the put mutants is evident in both SC5314 and cph1 efg1 backgrounds (Fig. 3G and Fig. S4C) suggests that we are not assaying opaque switching. We mention that we have observed an increase in the number of PI+ cells in put mutants under similar conditions, but as we pointed out, we were unable to reliably quantitate this by FACS due to the clumping of put mutants. Zheng et al 2022, the paper cited by the reviewer, used PXB to assess the ploidy of C. glabrata strains, but their assay was developed using 5 μg/ml PXB, half of the concentration we used. The homogenous accumulation of PXB as the macrocolonies grow (Fig. 3G), suggests that the accumulation is not a consequence of spontaneously occurring ploidy variations. Thus, we believe that the accumulation of PXB does indeed reflect enhanced cell death. The point here is to trace the consequences of proline toxicity and to test the dependency on mitochondrial function. We used complex media, which contains multiple nitrogen sources (amino acids, peptides), to specifically highlight the contribution of proline catabolism in the fitness of C. albicans. The put1, put2 and put1 put2 mutants grow normally on YPD+PXB (30 oC) without accumulating the dye; we only observed visible PXB uptake in put2 after 2-3 days in mature macrocolonies. We attribute the gradual increase in PXB accumulation to be a consequence of glucose becoming limiting, derepressing mitochondrial functions, a requisite for proline toxicity. Consistently, the accumulation is more evident in cells grown on non-fermentable C-sources (Fig. 3G and Fig S4C).

      1. 295-301: Related to the point above, these results are hard to interpret due to the switch from defined medium in all prior experiments to rich growth medium here. Also, it is not clear why a 48h old YPD culture was chosen to show that the degree of PI staining correlates with mitochondrial activity - is this due to the culture age? It would be more clear to image cells grown on glucose vs. glycerol/lactate, or under repressive / de-repressive glucose concentrations (e.g., as shown in Fig. S4C where a PI+ difference is apparent for 0.2% glucose vs. 2% glucose at 30 oC).

      See response to Point 19 for our rationale to switch to rich medium. We have adjusted the text to enhance its readability. In liquid YPD, all strains grow, however, we noticed that the put mutants tend to flocculate (sign of stress in yeast) when cells enter stationary phase, giving rise to erratic OD readings, particularly evident in the put1 mutant. At 48h, the cultures become dense and cells experience glucose limitation, derepress mitochondrial functions and exhibit maximal flocculation (Fig. S4D). In put mutants, the derepression of mitochondrial function results in proline sensitivity. We tested the notion that this would also increase cell death, which it does, see Fig. S4E.

      1. 313-14: The statement 'the invasion process was dependent on the ability of cells to catabolize proline' doesn't take into account that put mutant cells are defective in filamentous growth irrespective of their utilization of proline...and like the efg1 cph1 double mutant.

      Proline-induced filamentous growth is dependent on the catabolism of proline, which activates Efg1 and consequently the hyphal growth program. In Fig. 4A we show that put mutants grown on Spider media, initiate filamentation (as evidence by wrinkled colonies) but do not grow invasively (no halo). In Fig. 4B we developed and used a novel invasion assay to assess growth through a collagen plug. Similar to the control cph1 efg1 mutant, the put mutants exhibit drastically reduced capacity to penetrate through the plug, and reach the D10 media in the transwell (D10 = DMEM with 10% FBS). However, it is important to note that although these results are linked to two distinct processes - the filamentation defect of cph1 efg1 is due to the inability respond to multiple filamentation cues (e.g., CO2, 10% FBS, etc.), whereas the filamentation defect of the put mutants is linked to the inability to catabolize proline and to its toxicity. Clearly, the WT strain relies on proline catabolism, coming from one or three possible sources of proline (see response to Reviewer 3): 1) DMEM/F-12 medium used in the PureCol EZ Gel; 2) diffusion of nutrients up through the collagen from the recovery medium DMEM supplemented with 10% FBS; and 3) the proteolytic breakdown of collagen. Also, in contrast to the put mutants, WT cells are refractory to inhibition by proline.

      1. 316-327: The results of the experiment described can only be interpreted as an effect of proline catabolism if the three strains (efg1 cph1; put1; put2) have similar growth rates as yeast cells in vitro. Why weren't the cells competed directly (efg1 cph1 vs put cells)?

      We believe that the relevant comparisons are to WT. We recovered cells from the top of the collagen (see Fig. 4B inset) to monitor their ability to survive and grow on top of the collagen. We found that the ability to catabolize proline enables WT and cph1 efg1 cells to grow equally well (recovered similar ratio as starting input). This was not the case with the put mutants, they did not grow as well and almost 100% of the cells recovered were WT.23.

      Fig 6: The logical order of the experiments, and in the text, is: 1) 4 h window, 2) 26 h window and then 3) ex vivo. The cartoon in 6B should be in this order as well.

      Thanks for bringing this issue up. We have adjusted the figure and text placing the schematic time-lines in proper order.

      1. 337: it is not clear what the 'direct exposure...' is trying to tell us. Can this be made more explicit?

      The direct exposure means that the fungal cells are in contact with the culture media at the edges/border of the 3D skin model (see schematic diagram). Hence, fungal cells are in direct contact with 10% FBS, facilitating the observed filamentous growth. The inability of the put mutants to invade the skin model should be evaluated at the center of the artificial epithelium where there is likely a local increased concentration of proline stemming from the proteolytic activities associated with fibroblasts and keratinocytes.

      1. 340-346: Here proteins with high proline content were used to ask if they could be induce transcription of PUT1 or PUT2 RNA and protein. This experiment is designed only to test the role of these proteins to induce utilization of nitrogen, as glucose is included in the medium. Given that these proline-rich proteins need to be lysed by proteases before they can be imported, and since no import pathways were tested, the results appear to tell us that mucin is more readily digested to peptides that contain proline-but why that is the case is not clear and how it relates to proline utilization is also not clear.

      We thank the reviewer for raising this important point. First, we monitored protein not mRNA levels. We will adjust the text to provide better context for this experiment. Briefly, these experiments were initiated as we were perplexed as to why the wildtype cells took such a long time (14 days) to fully invade the collagen matrix (Fig. 4B); we naïvely assumed that fungal cells would secrete proteases to degrade the collagen and assimilate the liberated proline. In going forward, our experimental strategy was to incubate various proteins with a dense culture of cells in HBSS medium (pH 7.4) supplemented with low glucose (3.8 mM) and lactate (0.83 mM). This condition mimics interstitial fluid, where most broad range proteolytic enzymes are inactive or at least operating suboptimal. The results were clear; with the exception of mucin, the proteins did not stimulate Put1 or Put2 expression. We conclude that host-dependent processes play an important role on the release of the amino acids/peptides from these high-proline content proteins (see line 531-553 for discussion). The capacity of mucin to efficiently induce Put1 expression is interesting since mucin is abundant in the gut where systemic infections are thought to originate. It is important to be cautious here, we used a commercial mucin preparation (Sigma, 2 batches) that may contain degradation products, e.g., proline-rich peptides, that can easily be assimilated by C. albicans. Put1 expression is an excellent readout for proline uptake since its expression responds tightly to the presence of proline derived from exogenous supply or from intracellular conversion (Fig. 2D, S2A, S2B).

      1. 363-369 An alternative is that Put3 induces different proteins important for growth.

      We included this possibility in the revised text.

      1. 379-380-the conclusion for this paragraph is somewhat of an overstatement as there is no analysis of the degree to which proline utilization is a predictor of virulence. It simply shows that put mutants affect the ability to survive in neutrophils.

      We have adjusted the text.

      1. Discussion: The statement that "S. cerevisiae" evolved in high sugar environments is debatable. The natural niche could well be forest soil and tree bark, or insect/wasp guts with arguably little glucose around.

      The reviewer is correct, S. cerevisiae can be isolated from diverse environments with variable sugar contents, but it is the capacity to deal with high sugar environments that makes this yeast stand out in comparison to Candida spp. The unique attribute of S. cerevisiae have been exploited and truly benefited humankind in making alcohol and bread. We have amended the text to state this more accurately.

      1. 469-470-how strong is the 'correlation' between the ability to utilize proline and virulence? Given that different mutants had different effects in different models, this seems like a very loose 'correlation'; it would be good to have some quantitative measures to make this claim.

      We have used directed genetic approaches to determine whether a gene/protein is essential for virulence by testing them in currently available infection models. It is important to note that all virulence assays provided a consistent and clear read-out, namely that the inability to catabolize proline significantly reduced the expression of virulence characteristics. Presumably the differences we report are due to the specific nutrient composition (proline and metabolites feeding into the proline catabolic network) and physical parameters intrinsic to each model. In fact, the expression of virulence factors (i.e., hyphal growth) can significantly differ in different organs within a same mouse model (Lionakis et al., 2013) and that virulence outcomes can change depending on mouse background. We fail to see how this can be viewed as loose. This has not been shown before. Please refer to our response to major point 6.

      1. 500: Was the experiment was done in larvae, and not in adult Drosophila? Fig 5 legend says flies and shows a picture of a fly and larvae are only mentioned much later in the text.

      These experiments were performed using adult flies. We now include a reference regarding the levels of arginine in hemolymph in both larvae and adult Drosophila (Priyankage et al., 2012; Anal Chem).

      1. 512:Why is it presumed that proline accumulates in the mitochondria in put1 mutants? How strong is the presumption?

      Despite a great deal of efforts in many labs, the mechanism of proline transport across the mitochondrial membrane is not known. What has been shown in mammalian and plant systems is that proline can readily enter and accumulate in mitochondria where it is catabolized. (https://link.springer.com/article/10.1007/s00425-005-0166-z; https://www.sciencedirect.com/science/article/pii/0003986177902089). Our presumption that proline accumulates in the mitochondria is based on our finding that proline inhibits mitochondrial respiration when Put1, catalyzing the first oxidation reaction, is absent.

      1. 539: why are MMPs important for digestion of collagen? This is not clear at this point of the Discussion.

      In mammalians cells, some secreted MMPs have collagenase activity (e.g., MMP-1) that degrade proteins comprising the extracellular matrix, which releases proline. We emphasize this since the 3D skin model is comprised of dermal fibroblasts and keratinocytes that are known to secrete MMPs (Ref. 69).

      1. 574: Concluding sentence of this paragraph seems unsubstantiated. There are at least two defects in put2 strains-hyphal growth and growth in general, presumably because of P5C accumulation.

      See response to point 21. Proline-induced filamentous growth is dependent on its catabolism, which activates Efg1 and consequently the hyphal growth program. However, there are many potential cues in hosts that could induce hyphal growth in situ. Our finding that strains unable to catabolize proline do not filament, indicates that proline is a key modulator of virulence.

      1. Fewer abbreviations would make the manuscript easier for non-experts to read. For example, P5C is not defined in the abstract. Furthermore, if an abbreviation is not used more than 3 times, it is not necessary to provide it (e.g., mammalian proteins in the last paragraph).

      We have adjusted the text.

      typos:

      1. 82: should read 'is restricted to the mitoch...'

      2. 102-103: should read 'to evade macrophages'

      3. Fig. S4F is mislabelled as Fig. S4G.

      Thanks!

      **Referees cross-commenting**

      Overall, we stand by our initial assessment of the study. However, we were not aware of previous studies that investigated proline utilization in yeasts, as noted by Rev # 2 (https://onlinelibrary.wiley.com/doi/epdf/10.1002/yea.1845). The current study suggests that using proline as an energy/carbon source is more wide-spread, beyond pathogenic yeasts. Further, the C. albicans strain they used for this study (ATCC 10231) was apparently unable to grow on proline in the quoted paper. In light of this, we think the authors should reference this study, tone down the claims about the clear correlation of pathogenicity and proline utilization, and address this apparent discrepancy with the indicated Candida albicans isolate. We note that our review considered this a paper mostly of interest to specialists.

      Although other non-pathogenic fungi have been shown to use proline as pointed out by Reviewer 2, this metabolic attribute has not been previously tested in members of the pathogenic Candida spp. complex. We have included the reference and included a statement that many fungi, isolated from diverse environmental niches, can use proline as a carbon source.

      Reviewer #1 (Significance (Required)):

      1. The advance in this paper is conceptual for the proline utilization connection to virulence in a range of species and technical for the in vivo microscopy. Limitations are that the conceptual advance is based only on qualitative work in figure 1 and that the animal studies do not provide a conceptual advance, although the technical advance of in vivo visualization of kidney tissue is impressive and (to the knowledge of this reviewer) quite new as the only prior work was in mouse ears.

      In response to the reviewer’s comment regarding Fig. 1, although it is qualitative, it is very reproducible. We even tried several clinical isolates of S. cerevisiae and observed consistent behavior to the standard laboratory strains (i.e., they do not grow on SP medium where proline is used as sole carbon/nitrogen/energy source). We tried to quantify growth of all strain in liquid SP medium at 30 oC using a TECAN microplate reader, but then the results show very erratic reading among species (and replicates) as each behaves differently; C. tropicalis, C. krusei, and C. parapsilosis form pseudohyphae and clump readily, while C. albicans forms hyphae and pseudohyphae.

      2.The work fits well as an extension of the body of work from the corresponding author's lab with additions from the labs with expertise in models of infection.

      1. People interested in yeast metabolism and pathogenic yeast virulence will be the audience for this paper and as written it is for a specialized audience interested in pathogenic yeast metabolism and, perhaps, (although not mentioned at all in the text) for those who want to try PUT gene products as new drug targets.

      This was actually mentioned in the last paragraph of the discussion (line 581-582).

      1. Reveiwer expertise is in pathogenic yeast biology and yeast metabolism. Little expertise in high tech microscopy.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The study is part of the continuous work by the authors to dissect the mechanism of utilization of proline as a carbon source in Candida spp. In particular, this work shows that the inability to process proline leads to accumulation of the toxic intermediate P5C and subsequent inhibition of mitochondrial respiration and toxic effect on the cells. Furthermore, the study demonstrates that proline utilization is important for C. albicans kidney colonization. The experiments are meticulously designed and the study adds to the overall understanding of the metabolic utilization of proline as a carbon source and its potential relevance for infection.

      I find this work interesting, but the role of Put1 and Put2 in proline utilization is not particularly novel. The novelty here is the subcellular localization of the two proteins. Also, the importance of proline utilization for infection is unclear. The host-pathogen interaction assays are ambiguous as each assay gives different result. Lastly, the authors try to generalize the importance of use of proline as a energy source by other Candida spp.. This is not very surprising, given that it has been reported previously by others (example DOI: 10.1002/yea.1845) and that many pathogenic or closely related to C. albicans species use various amino acids, not only proline, as a carbon source.

      Yes, as reviewer 2, we are not surprised that many of the pathogenic members of the Candida spp. complex are able to use proline, but this needed to be checked. The fact that proline can be used as a sole carbon/nitrogen/energy source clearly set them apart from the paradigm yeast S. cerevisiae. A major question is what amino acids are important in the context of the host? To assess this, we have used mutations that specifically block proline utilization. Our past studies demonstrating that proline catabolism is rapidly activated in C. albicans cells phagocytized by macrophages indicates that proline is present in the phagosomal compartment. Furthermore, put mutations clearly affect virulence in flies and murine systems. We are at a loss to understand why the reviewer believes that our data, which consistently shows that proline catabolism is important, is ambiguous.

      The expectation that all three mutant strains, i.e., put1, put2 and put3, would behave identically in the different infection models reflects an unnuanced view of how infection works. In fact, differences considered trivial such as the use of mouse background can have a profound effects on virulence. Consequently, it is striking how the diverse infections models consistently and unequivocally demonstrate that proline catabolism affects virulence. Also, it should be appreciated that we are not testing mutations affecting proteins with many overlapping functions, where it may be appropriate to challenge claims as to their direct role in virulence. Here we tested mutants that lack the enzymes that catalyze proline utilization. A more reasonable expectation is that the virulence is commensurate to the specific nutrient composition of model systems (as asked by reviewer#1), which can fluctuate among models (see our response to the major comment 6 of reviewer 1). As it is not practical to precisely test the proline levels in the models, we have worked to identify and focus on critical phenotypes that can be analyzed in vitro. Our findings provide the basis for understanding the virulence and growth properties of the mutants in the context of the complex infection models.

      Moreover, the authors take C. albicans as an example to demonstrate the role of PUT in invasion and infection. Proline is known stimulus for hyphal growth in this species, but many other Candida spp., including C. auris, do not filament. So how, aside from supporting growth, proline is linked to infection in these species? I think the authors oversell the importance of proline in Candida spp. pathogenesis and should tone this part down or remove completely. A new story that validates the importance of PUT in non-albicans species can bring clarity to why and where proline is critical for survival and infection.

      The fact that proline supports growth in the host environment is one of the critical aspects of our work. The lack of appreciation for this finding represents a common misconception in infection biology. It is not just the ability to gain access to a host and initiate an infection that counts, it is equally important to sustain growth and to thrive within the host. Thus, the adaptation to the host environment is critical. Here we document that proline catabolism not only initiates but sustains an infection acting as a critical carbon/energy source. The inability of the put1 and put2 mutants, which are sensitive to proline, to grow and infect multiple models clearly suggests the substantial quantity of proline is accessible. Also, we have constructed C. glabrata (Fig. S1C) and C. auris (not shown) strains that lack the ability to catabolize proline, and are currently characterizing the virulence properties of these strains. This is out of the scope of the present study.

      Major comments: I am not convinced by the data that proline is important to initiate infection. Candida infections of the kidney occur only at late stages of sepsis. The authors need more compelling data to prove that proline is important for infection in the host.

      Again, not sure why there is such skepticism here, regardless of whether kidney infections occur late, the fact that in contrast to WT, we do not observe put mutants filamenting, clearly suggesting that the capacity to catabolize proline plays a role in the expression of virulence characteristics of C. albicans. Based on our findings using IVM, which provides 3D information, we can at least conclude that a single isolated C. albicans cell can initiate hyphal growth, initiating a point of infection. In addition, our newly added whole human blood data suggests that proline catabolism is required for survival in the blood; human blood contains high amount of proline, arginine, and ornithine that are all catabolized via the proline catabolic network.

      Minor comments: I find the manuscript difficult to read and the discussion part is overly long. Some streamlining and adding a bit more explanation for the rationale of each experiment will make the work easier to follow. Some language/style needs refining as well.

      We have attempted to take this critique into account during the revision of the manuscript and have streamlined the text and added explanations regarding the rationale underlying our experimental approaches.

      **Referees cross-commenting**

      In this manuscripts the authors clarify the cellular compartmentalization of steps in proline catabolism. However, it is not novel that proline is a valuable carbon source. The role of proline utilization for establishing or progression of infection remains ambiguous even after the authors provide different in vivo results. The overall significance of the study is limited.

      Please refer to our comments below. We do not understand that the reviewers apparently question the obvious role of proline utilization facilitating virulence.

      Reviewer #2 (Significance (Required)):

      The strengths of this study are in the experimental design and variety. The data is well presented and visualized. The limitations are as pointed above - I find it especially difficult to figure out where, in a real infection scenario (e.g. breach of the gut barrier and entry into the bloodstream) proline will be the primary energy source. To me the significance of this work is minor.

      C. albicans is the primary human fungal pathogen placed under the “Critical Priority Group” by WHO and yet our understanding of nutrient assimilation in this fungal pathogen is only a fraction of what is known in the model yeast S. cerevisiae, which has proven not to be the best paradigm for understanding the regulatory circuits operating in human fungal pathogens. This manuscript, as well as other recent publications, have revisited and corrected earlier assumptions regarding C. albicans growth, providing novel information that reflect important regulatory differences specifically relevant to the life of C. albicans in the host. For example, had it not been for the recent findings (Ref. 10, 18, 31) that show that proline utilization in C. albicans is not subject to nitrogen catabolite repression (NCR) and that glucose represses mitochondrial function, the perception in the field would remain that C. albicans cannot utilize proline as a carbon and/or nitrogen source in the presence of a “preferred” source of nitrogen, which is applicable in the blood that contains high concentrations of possible sources of carbon and nitrogen. Furthermore, the low but constitutive expression of Put2 and the tight highly responsive Put1 expression in response to proline (Fig. 2D, S2A, S2B), suggest that C. albicans is well equipped to productively anticipate proline availability depending on the host status, entirely consistent with its “opportunistic” character. The many incorrect and previously held assumptions regarding C. albicans, uncritically propagated in several influential reviews, likely have hampered efforts to develop novel antifungal therapies. We do not understand, nor accept the view that a more precise understanding of the proline catabolism is incremental.

      The type of question raised by the reviewer is exactly what we hope to achieve in the future but to get there we have to have correct assumptions in place, and this is only possible if we have a more thorough understanding of the regulatory mechanisms driving proline utilization in C. albicans. The idea that certain proteins are refractory to degradation by C. albicans suggest that other external factors are triggering the release of amino acids from these proteins. This work however, suggest that proline is likely accessible in the gut due to the presence of proline-rich proteins like mucin (Fig. S5A/B).

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript of Silao et al. describes an in-depth investigation of the role of Put1 and Put2 enzymes in proline catabolism and virulence in Candida albicans. This is an extension of previous work in this system. The basic biochemistry and genetics are solid and support the role of these enzymes in the proposed pathway and provide evidence that the build up a toxic intermediate in the absence of Put2 is likely involved in the poor growth of the strain when proline is the only carbon source.

      Note that we observe the toxic effects of proline even when it is not the sole carbon source, however, and importantly, toxicity is dependent on mitochondrial function, which is repressed by high levels of glucose. Proline toxicity is observed when glycerol/lactate are present as carbon sources in addition to proline. Under these conditions, mitochondria are not repressed and exogenous proline impairs growth, particularly evident in put2 cells that accumulate the toxic intermediate P5C.

      The conclusions regarding its role in virulence are less convincing, particularly the data derived from the collagen invasion assay, the ex vivo skin model and the ex vivo/in vivo imaging. The survival and fungal burden assays support a modest role in virulence and a modest reduction in infectivity (although the presented data for survival does not have statistical significance data reported for the kaplan analysis.

      See below for response regarding collagen assay. We have included the significance values derived from Kaplan analysis in the revised Fig. 5B.

      The manuscript is clearly written. The methods are well described.

      **Referees cross-commenting**

      I remain unconvinced of the broad significance of the advances and stand by my assessment that this is for the most part a reasonable study but does not move the field forward. The novel technical aspects are either extensions of previous in vivo imaging or are not well controlled (collagen invasion assay)s.

      See below for response.

      Reviewer #3 (Significance (Required)):

      This is a detailed study of an area that is fairly mature and thus will be of interest to those in the field but does not represent a large advance and is thus truly incremental.

      See below for response.

      Major limitations of the work are as follows. First, the collagen invasion assay may be flawed. The recovery media is made with DMEM which is a medium that lacks proline and is fairly stringent. Control experiments need to be done to be sure that the mutants grow in the recovery medium. Second, the data from the RHE model are hard to interpret since so few cells are present in the tissue. It is hard to see if there are few filaments of if there are just too few cells to assess in the tissue. Third, in vitro experiments assessing the filamentation of the mutants in the medium in which these assays are preformed need to be done as controls. Candida albicans filaments in many conditions such as tissue culture medium. Spider medium is a strong inducer of filamentation but is very different than in vivo/ ex vivo conditions.

      Related to the collagen invasion assay, there is a misunderstanding. The reviewer appears to confuse the put mutations with proline auxotrophy. The put mutants are proline prototrophs and can synthesize proline as they possess a full repertoire of biosynthetic enzymes. In contrast, the put mutants cannot utilize proline to obtain nitrogen or energy. In fact, the presence of excess proline imposes toxicity to the put mutants. There are three possible sources of proline. 1) PureCol EZ Gel is a ready-to-use collagen solution that forms a firm gel when warmed to 37 °C. It contains purified Type I bovine collagen (5 mg/ml) dissolved in DMEM/F-12 medium, which has multiple amino acids, including a substantial amount of arginine. 2) The recovery medium DMEM supplemented with 10% FBS. The presence of FBS provides amino acids and induces filamentous growth. As the reviewer points out, C. albicans grows in this media and exhibits filamentous growth. 3) The proteolytic breakdown of collagen is expected to liberate proline. Consequently, the poor growth of the mutants clearly demonstrate the importance of proline catabolism. Also, the fact that we recovered put mutants surviving on top of the collagen (Fig. 4B, inset) suggests that they remain viable but simply are unable to efficiently invade the collagen. Consistently, microscopic inspection of the wells of the put mutants showed extremely few or even complete absence of invading cells in the recovery medium. We will adjust the text and provide a more detailed description of the experimental set-up. In summary, the main concern of the reviewer with respect to lack of proline is not relevant.

      Regarding the 3D-skin model, equal numbers of fungal cells were applied on top of the RHE. To avoid overgrowth, only low numbers (100 C. albicans cells) can be applied for the WT strain, and consequently for all other strains. In contrast to WT, which clearly proliferates, the apparent low level of put1 and put2 cells at the center of the 3D skin model is the consequence of poor growth. The upper layer of the RHE consists of stratified keratinocytes. To grow, WT fungal cells obtain proline either directly from the keratinocyte, from secreted proteases that liberate proline from keratin (proline not as abundant in keratin as in collagen, the main component of the dermis), or from the medium that basolaterally feeds the RHE. At the border of the model leakage from the medium can occur. Our results, showing poor growth of the mutants in the center of the 3D-skin model, entirely consistent with the collagen plug experiments, indicates that proline catabolism plays a determinant role to enable invasive growth.

      Lastly, the imaging experiments are highly problematic. First, reference must be made to previous ex vivo imaging reported by the Lionakis lab in 2013. Second, the number of cells imaged is so low that there is no power to make any conclusions. At 24 hr, the mutants may be delayed in filamentation or they may be delayed in establishing infection. There is no way to know what is causing the apparent lack of filaments. This technique as presented is not any higher resolution than traditional histology and in fact histology would provide a more convincing case for reduced filamentation.

      These considerations significantly reduce the overall significance of the work.

      I work on Candida albicans.

      We thank the reviewer for highlighting the beautiful study by Lionakis et al which document the host response, specifically the role of macrophages in mitigating C. albicans infection of the kidney. However, the reviewer apparently failed to recognize that their method is completely differed from ours. Lionakis et al. performed ex vivo imaging of kidney slices using regular confocal imaging, and the authors express an awareness regarding the limitations of this approach. In fact, these authors even state in their discussion that intravital microscopy should be pursued in the future to further investigate Candida-macrophage interactions in the kidney. Also, they point out that kidney-specific factors seem to facilitate rapid filamentous growth of C. albicans. In our work, we have experimentally addressed both of these astute statements. To our knowledge, our work is the first report of imaging a Candida cell infecting a kidney in a living mouse, which on its own is a major development and achievement considering the complexity of the kidney microenvironment. The finding that the put2 mutant does not exhibit filamentous growth in the kidney of a living mouse (24 h) is striking and strongly suggests that a substantial quantity of proline, or amino acids (e.g., arginine) that are metabolized via the proline catabolic network, is present in the kidney. This is clear based on finding that WT C. albicans cells respond accordingly to initiate hyphal growth. Consistent to this, it is well documented that the kidney is a major metabolic hub for arginine and proline metabolism. The work by Lionakis aligns remarkably well with our previous and current work in that put mutants exhibit greatly reduced survivability in co-culture with macrophages and do not evade these primary immune cells due to their inability to induce filamentous growth within the phagosome (Silao et al., 2019). We have adjusted the text to include a discussion that places our work in the context of the Lionakis work.

      We have added a Fig. 6C showing an example of the scanned area of the kidney. Further we added the following in the revised legend to indicate that large areas of kidneys were imaged in our assessment of fungal growth and filamentation:

      “Sites of colonization where localized using a spiral scan in the Las-X Navigator-module in the FITC channel. The entire area of the renal surface attached to the glass imaging window was scanned; circles highlight examples of regions of interest (ROI) exhibiting stronger and deviating fluorescence from the background. Each ROI was examined in detail using FITC, yEmRFP and autofluorescence. Scale bar, 500 µm.”

      CONCLUDING STATEMENT – SUMMARY RESPONSE:

      Our current work is based our previous discovery that proline metabolism provides energy to induce and support filamentous growth (PLoS Genetics, 2019). This turned out to be important since we also discovered that C. albicans cells depend on mitochondrial proline metabolism to evade engulfing macrophages, implicating this process as being an important virulence determinant. Consistently, using time-lapse microscopy, we subsequently found that proline catabolic enzymes are rapidly induced in C. albicans cells upon phagocytosis by macrophages. These results demonstrated that proline is present within phagosomes. As exciting as these findings are, they focused on a single phenotype, i.e., filamentation, and were obtained using in vitro experimental approaches. These results demanded that we pursue additional avenues to further characterize and test the in vivo relevance and merely provide a solid background for the current work.

      In contrast to reviewer 2 and 3, we do not believe that our finding that proline catabolism plays such a critical role in virulence as being merely “incremental”. We also could not have foreseen that the ability to use proline as an energy source is a common feature of multiple fungal pathogens capable of causing human disease. This is conceptionally very important in that human fungal pathogens, unlike the well-studied yeast Saccharomyces cerevisiae, are not readily found out in nature, and thus have evolved to use a similar spectrum of nutrients as host cells, including cancer cells. It is important for the fungal pathogen community to realize that regulatory switches operating in C. albicans are wired substantially differently to those in S. cerevisiae, and are likely optimized to reflect the actual condition in the host environment. The growing appreciation that diverse cancers are able to shift metabolism to exploit proline as an energy source is strikingly and fascinatingly similar to our findings with pathogenic fungi. This represents a conceptual advance in that it points to the wealth of proline stored within extracellular matrix proteins as providing a potential and significant source of energy for virulent fungal and cancerous growth.

      Finally, we strongly believe it is improper to extrapolate virulence properties based on in vitro findings, and that it is essential to actually test host-microbial pathogen interactions using refined in vivo models. Our successful use of advanced intravital microscopy goes beyond traditional and accepted murine infection models and has provided us with a unique state-of-the-art vantage point. Our findings that a single C. albicans cell is able to initiate and establish a site of infection in a kidney within a living mouse is itself important, and coupled to the novel finding that hyphal development at sites of infection depends on the ability of the fungal cells to catabolize proline must reflect the physiological conditions in the kidney. This is not an incremental finding, and we do not understand that reviewers 2 and 3 diminish the significance of these findings. Clearly, our manuscript provides a strong foundation for more detailed and advanced studies.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Silao et al make the intriguing observation that yeasts that are generally considered less pathogenic are unable to catabolize proline than Candida albicans. They then, in Candida albicans, construct mutants defective for the two key enzymes (Put1, Put2) required to convert proline to glutamate, which they show to be essential for proline utilization as an energy (carbon) and nitrogen source. The authors proceed to untangle the regulatory aspects of proline degradation, including the respective cellular localization of its key enzymes. They then make the important discovery that strains lacking either Put1 or Put2 suffer from a proline-dependent growth defect, which they attribute to resulting defects in mitochondrial metabolism.

      The manuscript then goes on to analyze a broad range of infection models including: reconstituted human epithelial skin model, Drosophila, mouse systemic infections, organ colonization in these mice (kidney, spleen, brain, liver and histochemistry of the kidneys) as well as survival when incubated with cultured human neutrophils. Finally, they use yeast cells constitutively expressing yEmRFP (so that yeasts can be distinguished from other host cells) and coated with FITC before incubation with the host cells (which coats the wall of the original cells, but does not spread to progeny) and they go on to perform an impressive set of analyses of C. albicans growth within mouse kidneys both in vivo and ex vivo, exploiting an implanted window together with intravital imaging with a two photon microscope at different time points. The system is impressive and visualizes tissue invasion by hyphal cells beautifully. Finally, they compare the intra vital images from WT and put2-/- cells and show that, as in vitro, put2-/- cells do not form filaments and do not show extensive invasion of the kidney tissue. While the in vivo aspect of the study includes many different models, it finds defects in virulence for different subsets of put mutants and the relative importance of filamentation vs proline utilization for virulence is not conclusively resolved.

      Overall, this is an important and timely manuscript, which significantly contributes to the understanding of how proline metabolism intersects with yeast fitness in the context of infections. However, there are several major concerns regarding some of the conclusions drawn from the study. In addition, some general recommendations that would improve the manuscript are provided.

      Specifically, the manuscript provides a very detailed description of experiments and observations. However, in several parts it is difficult to follow and the the reader needs more guidance about the logic involved in reaching conclusion. Specifically, several aspects of the paper are written for experts in Candida (yeast) metabolism. Here, explaining the rationale for some of the experiments, and providing more background information that is not obvious to a non-expert, is required.

      In particular, writing a clear and measured summary sentence at the end of each paragraph and a conclusion paragraph that summarizes key findings in simple terms would help make the manuscript more digestible for readers.

      In addition, the impressive microscopy and broad range of in vivo experiments is comprehensive but only adds incremental information relevant to proline metabolism-that filamentous growth in vivo and virulence is reduced in cells carrying some mutations in one or more put genes. However, this broad sweep of model systems and the development of the in vivo imagining system might have more impact in a separate paper focused on the real-time in vivo visualization of kidney invasion.

      Major comments:

      1. The main finding that impressed this reviewer is that "removing the ability to catabolize proline, in an organism that evolved to catabolize it, leads to (growth) defects". This point could be better highlighted throughout the manuscript.
      2. The authors show that deletion strains for proline metabolism have defects that are important for in vivo pathogenicity. This is an important finding. However, as the manuscript reads now, it suggests that the main findings are that the ability to use proline in the respective host niche is key. Mechanistically, the manuscript revolves primarily around defects that arise when deleting PUT1 and/or PUT2 (i.e., an "unknown" toxicity of proline in the case of put1-/- (or put1-/- put2-/-) and the additional P5C-dependent toxicity for put2-/- mutants; see below).
      3. In order to claim that catabolizing prolines promotes pathogenicity (as opposed to the alternative hypothesis that the inability to catabolize proline leads to the observed defects), additional experiments would be required. For example, the put mutants would need to be compared with mutants that significantly reduce/impair proline uptake, such as the referenced gnp2 mutant (Garbe et al 2022). While the finding that less pathogenic yeast species are unable to catabolize proline is both intriguing and important, it also remains as is presented as a loose, non-quantitative correlation that only tangentially address the question of whether "proline catabolism is key for pathogenicity".
      4. 238 onwards: The conclusion that "the primary growth inhibitory effect of proline is linked to catabolic intermediates formed by Put1 and that are metabolized further by Put2"does not appear to be fully supported by the evidence. Addition of proline to put1 mutants already reduced OD600 by ~50% (Figure 2); and is further reduced to ~10% when put2 is deleted. This implies that there are two inhibitory effects of proline, not one primary one. At the least, this option should be discussed, including why deletion of PUT1 leads to proline toxicity. The latter is not clear-is it that too much proline accumulates in the cell and this accumulation is toxic? If this is the case, the effect would be expected to be proline concentration dependent. Performing a relatively simple experiment as performed for the put2 mutant (Fig. 3 / S3F) may clarify this issue. Particularly, if the experiment would be coupled with intracellular quantification of proline.
      5. The caption "P5C mediates a respiratory block" is misleading, as the evidence is not that compelling: Although P5C increases in put2, but not in put1 mutants, and given that both single mutants experience a proline-dependent respiratory defect (Fig. 3E), the results suggest a more complex relationship.
      6. The virulence assays and in vivo experiments do not present a unifying view: in Drosophila put2∆∆ is less virulent than put1∆∆, which appears similar to put3∆∆. Given that put2 mutants grow slowly, likely because of P5C inhibition, this seems logical. However, in mice, put3∆∆ remains highly virulent while put1∆∆ and put2∆∆ results for survival are mixed. Furthermore, in 4 mouse organs, put1∆∆ and put2∆∆ are not significantly different from one another but are different from wt, while put3∆∆ has no significant reduction in CFU. Kidney histology shows very little invasion by put1 and put2 and more by put3, but visually put3 appears to invade much less than the WT, and the human neutrophil experiment shows effects of put2 or put3 but not put1. This leaves the reader rather confused. It may be worth discussing the reasons for different results in different models. Is the availability of proline in each of the organisms and organs similar?
      7. The ex vivo and in vivo analysis of the dynamics of C. albicans growth in the host is visually impressive, but it distracts from the focus of the paper and the metabolic findings. Showing that put mutant cells do not form filaments in vivo (as in vitro) does not add much conceptually to the paper. Furthermore, this lovely advance in in vivo visualization is lost at the end of this paper and the authors should consider whether it might fit better in manuscript that could really highlight the in vivo visualization approach.
      8. The discussion of cells stained with FITC and expressing yEmRFP does not clearly point out that the FITC is only an indicator for those cells that were used to innoculate the tissue and that finding cells without FITC indicates that they are mitotic progeny, indicating that they have been dividing. The authors clearly understand this, but a naive reader may miss this important point if it is not stated explicitly.

      Minor comments:

      1. Throughout: what is the distinction between utilization of proline for C or for energy? These terms seem to be used interchangeably.
      2. Introducing the schematic in Fig. 2A at the beginning of Figure 1, would help explain proline catabolism before delving into the growth experiments that rely upon this framework. This should include an explanation, for readers less familiar with the metabolic issues, of the main limitations to catabolizing proline, and the key issues for being able to use proline for nitrogen, carbon, and energy (potentially indicated in the overview figure, e.g. pointing towards gluconeogenesis etc.).
      3. Saccharomyces can only grow on proline as a nitrogen source, but not as energy/carbon source. Could the authors briefly mention or discuss why this is the case? This is not clearly apparent after reading the manuscript and it leaves the reader confused and trying to understand if the fact that proline is required for carbon utilization is a new finding of this paper or was already known. Do the authors think this is tied to the presence of complex 1 components in C. albicans that are not found in S. cerevisiae. Is this consistent for the pathogenic, but not the non-pathogenic yeasts analyzed in figure 1?
      4. 100: While Gdh2 is apparently an important enzyme for generating ammonium, why is it not necessary for macrophage escape and virulence as shown in reference 18? A recent paper from Garbe et al (ref 12) suggests that Gnp2 is the major proline permease in C. albicans and what is known, and not known, about proline uptake would be good to mention, given that PUT gene functions require that proline enters the cells.
      5. 116: Is the "low sugar environment of the host" referring to a specific niche, such as the GI tract, or human blood? Compared to most natural environments, glucose is abundant in the host, e.g., at ~5 mM, it is the most abundant metabolite in blood, and similarly, in the GI tract, levels can go beyond 50 mM glucose (see e.g. PMIDs 34371983, 21359215). Or is this comment indicating that the in vivo sugar concentration is lower than that in common lab growth media? Please spell out the niche/concentration for clarification - and compare that to other niches that are considered "high sugar environments".
      6. 123: "proline as sole energy source" - suggest "is the source of carbon, nitrogen, and energy"
      7. 142: it is worth noting to readers that C. neoformans is a basidiomycete and thus VERY distant from the other yeasts studied here-it is in a different major phylum of fungi.
      8. 143: Here it is implied that put1 and put2 mutant strains do not grow on SPD, but this is not stated explicitly.
      9. 151: The abbreviation SPG is not explained in main text.
      10. Paragraph 156 onwards: this section is particularly hard to read and very dense. Also, it is difficult to understand the significance of these experiments for the overall findings of the paper. Please at least provide a small conclusion / summary at the end of the paragraph that puts the findings into perspective.
      11. Figure 2 C: simplifying the scheme (e.g. lots of redundant information, P2 and Mito - just give it one name) would help. This figure may be better in the supplementary material.
      12. Figure 2B: It is not directly apparent from the micrographs that Put1-RFP localisation is mitochondrial. Co-localisation of the RFP with a mitochondrial dye (e.g., mitotracker) or something similar is required to validate it.
      13. Throughout the manuscript (figure legends): Suggest using "mean" instead of "Ave."
      14. 175: According to the 'Yeasttract' and 'Pathoyeasttract' databases, Put1 regulates at least 36 and 22 genes, in S. cerev. and C. alb., respectively (based on DNA binding and/or regulatory changes). The only gene in common between these two lists of genes is PUT1. Thus, it is quite likely that Put3 regulates many other processes that explain its function and that its major function may not be only to regulate Put1.
      15. 175: Is it clear whether the Put3-independent mechanisms are positive or negative with respect to Put1?
      16. 218: Suggestion: "growth was indistinguishable".Unless growth curves or growth rates are provided and if one time-point data are the basis for this point, than "rates" is not a relevant term.
      17. 256 onwards: did the authors test if the ROS scavenging effectively reduced ROS? i.e. does the luminol-HRP assay yield less ROS in +proline +scavenger treatment? This is necessary to effectively conclude that the growth inhibitory effect of proline is due to blocking respiration.
      18. The Figure captions are extremely lengthy and detailed, making it cumbersome to find the relevant information. Suggest moving some of the information, such as additional experimental details, into the methods section.
      19. 277-301: Phloxine is not exclusively a live/dead cell indicator-it is an indicator of metabolic activity. In Scerev. and Calb. it also indicates slower growth, opaque growth, and it has been used as an indicator of aneuploidy in C. glabrata (https://journals.asm.org/doi/10.1128/msphere.00260-22) and of diploids vs haploids in S. pombe. The colonies illustrated aer made up of many live cells, and thus the section "Defective proline utilization is linked to cell death" needs to be presented more carefully. In addition, it appears that this section shifts from using defined medium to using rich medium and 37C instead of 30C. Why was this shift necessary?
      20. 295-301: Related to the point above, these results are hard to interpret due to the switch from defined medium in all prior experiments to rich growth medium here. Also, it is not clear why a 48h old YPD culture was chosen to show that the degree of PI staining correlates with mitochondrial activity - is this due to the culture age? It would be more clear to image cells grown on glucose vs. glycerol/lactate, or under repressive / de-repressive glucose concentrations (e.g., as shown in Fig. S4C where a PI+ difference is apparent for 0.2% glucose vs. 2% glucose at 30{degree sign}C).
      21. 313-14: The statement 'the invasion process was dependent on the ability of cells to catabolize proline' doesn't take into account that put mutant cells are defective in filamentous growth irrespective of their utilization of proline...and like the efg1 cph1 double mutant.
      22. 316-327: The results of the experiment described can only be interpreted as an effect of proline catabolism if the three strains (efg1 cph1; put1; put2) have similar growth rates as yeast cells in vitro. Why weren't the cells competed directly (efg1 cph1 vs put cells)?
      23. Fig 6: The logical order of the experiments, and in the text, is: 1) 4 h window, 2) 26 h window and then 3) ex vivo. The cartoon in 6B should be in this order as well.
      24. 337: it is not clear what the 'direct exposure...' is trying to tell us. Can this be made more explicit?
      25. 340-346: Here proteins with high proline content were used to ask if they could be induce transcription of PUT1 or PUT2 RNA and protein. This experiment is designed only to test the role of these proteins to induce utilization of nitrogen, as glucose is included in the medium. Given that these proline-rich proteins need to be lysed by proteases before they can be imported, and since no import pathways were tested, the results appear to tell us that mucin is more readily digested to peptides that contain proline-but why that is the case is not clear and how it relates to proline utilization is also not clear.
      26. 363-369 An alternative is that Put3 induces different proteins important for growth.
      27. 379-380-the conclusion for this paragraph is somewhat of an overstatement as there is no analysis of the degree to which proline utilization is a predictor of virulence. It simply shows that put mutants affect the ability to survive in neutrophils.
      28. Discussion: The statement that "S. cerevisiae" evolved in high sugar environments is debatable. The natural niche could well be forest soil and tree bark, or insect/wasp guts with arguably little glucose around.
      29. 469-470-how strong is the 'correlation' between the ability to utilize proline and virulence? Given that different mutants had different effects in different models, this seems like a very loose 'correlation'; it would be good to have some quantitative measures to make this claim.
      30. 500: Was the experiment was done in larvae, and not in adult Drosophila? Fig 5 legend says flies and shows a picture of a fly and larvae are only mentioned much later in the text..
      31. 512:Why is it presumed that proline accumulates in the mitochondria in put1 mutants? How strong is the presumption?
      32. 539: why are MMPs important for digestion of collagen? This is not clear at this point of the Discussion.
      33. 574: Concluding sentence of this paragraph seems unsubstantiated. There are at least two defects in put2 strains-hyphal growth and growth in general, presumably because of P5C accumulation.
      34. Fewer abbreviations would make the manuscript easier for non-experts to read. For example, P5C is not defined in the abstract. Furthermore, if an abbreviation is not used more than 3 times, it is not necessary to provide it (e.g., mammalian proteins in the last paragraph).

      Typos: 1. 82: should read 'is restricted to the mitoch...' 2. 102-103: should read 'to evade macrophages' 3. Fig. S4F is mislabelled as Fig. S4G.

      Referees cross-commenting

      Overall, we stand by our initial assessment of the study. However, we were not aware of previous studies that investigated proline utilization in yeasts, as noted by Rev # 2 (https://onlinelibrary.wiley.com/doi/epdf/10.1002/yea.1845). The current study suggests that using proline as an energy/carbon source is more wide-spread, beyond pathogenic yeasts. Further, the C. albicans strain they used for this study (ATCC 10231) was apparently unable to grow on proline in the quoted paper. In light of this, we think the authors should reference this study, tone down the claims about the clear correlation of pathogenicity and proline utilization, and address this apparent discrepancy with the indicated Candida albicans isolate. We note that our review considered this a paper mostly of interest to specialists.

      Significance

      1. The advance in this paper is conceptual for the proline utilization connection to virulence in a range of species and technical for the in vivo microscopy. Limitations are that the conceptual advance is based only on qualitative work in figure 1 and that the animal studies do not provide a conceptual advance, although the technical advance of in vivo visualization of kidney tissue is impressive and (to the knowledge of this reviewer) quite new as the only prior work was in mouse ears.
      2. The work fits well as an extension of the body of work from the corresponding author's lab with additions from the labs with expertise in models of infection.
      3. People interested in yeast metabolism and pathogenic yeast virulence will be the audience for this paper and as written it is for a specialized audience interested in pathogenic yeast metabolism and, perhaps, (although not mentioned at all in the text) for those who want to try PUT gene products as new drug targets.
      4. Reviewer expertise is in pathogenic yeast biology and yeast metabolism. Little expertise in high tech microscopy.
    1. Author Response

      Reviewer #1 (Public Review):

      Various parts of the premotor cortex have been implicated in choices underlying decisionmaking tasks. Further, norepinephrine has been implicated in modulating behavior during various decision-making tasks. Less work has been done on how noradrenergic modulation would affect M2 activity to alter decision-making, nor is it clear whether noradrenergic modulation effects on activity would differ between the male and female sexes.

      This manuscript addresses some of these questions.

      • In particular, clear sex differences in task engagement are seen.

      • May also show some interesting differences and distributions of β2 adrenergic receptors in M2 between males and females.

      We thank the reviewer for their summary of our findings and thoughtful critique of our manuscript. In our revised manuscript we have taken measures to address the reviewer’s comments in line (blue edits in text and revised figures) with direct responses outlined below. We believe these revisions improve the scientific rigor of our findings and provide relevant context for our studies. We hope that they have sufficiently addressed the reviewer’s concerns.

      Less clear is the specificity of systemic antagonism of β adrenergic receptors on the changes in M2 activity reported. As propranolol was given systemically, changes in M2 firing rates could also be due to broader circuit (indirect) activity changes. As it was not given locally, nor were local receptor populations manipulated, one is unable to make the conclusion that changes in neural activity are due to the direct effects of adrenergic receptors within M2 populations.

      We agree that propranolol driven changes in anterior M2 activity may arise via multiple mechanisms, including direct action on the adrenoreceptors within M2, and indirect action via other regions that project to M2. Although locally activating inhibitory interneurons within M2 is sufficient to disrupt cueguided action plans and behavior in a 2AFC task (Inagaki et al., 2018), our noradrenergic manipulation was not restricted to M2. We have clarified our conclusions and provided additional discussion to highlight that propranolol actions were multifaceted and that direct actions in M2 are likely working in concert with propranolol mediated actions in other regions.

      Also not clear, is the contribution of M2 to this task, and whether the changes in M2 activity patterns observed are directly responsible for the behavioral disruptions measured.

      We have revised our introduction and discussion to more clearly outline the critical role of cue-guided action plans in M2 for successful behavior in 2AFC tasks. Suppression of cue-guided activity in M2 results in behavioral performance at near chance levels, similar to what we saw in females after propranolol (Guo et al., 2017; Inagaki et al., 2018; Li et al., 2016). Furthermore, targeted photostimulation of action plan encoding neurons in M2 is sufficient to drive behavioral responses (Daie et al., 2021). In our investigations it is plausible to expect propranolol related disruptions in other cognitive, sensory or motor regions. Based on the strong foundational evidence for M2 activity in 2AFC, the propranolol driven changes in anterior M2 in females, whether direct or indirectly mediated, are likely sufficient to drive behavioral disruptions in accuracy and/or trial completion.

      Reviewer #2 (Public Review):

      This paper by Rodbarg et al describes an interesting study on the role of beta noradrenergic receptors in action-related activity in the premotor cortex of behaving rats. This work is precious because even if the action of neuromodulatory systems in the cortex is thought to be critical for cognition, there is very little data to actually substantiate the theories. The study is well conducted and the paper is well written. I think, however, that the paper could benefit from several modifications since I can see 3 major issues:

      We thank the reviewer for their generous comments on the potential impact of our manuscript as well as their suggestions to improve this work. Below we outline responses to specific comments raised by the reviewer in addition to adresing them in the revised manuscript. We hope these responses sufficiently address the reviewer’s concerns.

      Both from a theoretical and from a practical point of view, the emphasis on 'cue-related' activity and the potential influence of NA on sensory processing is problematic. First, recent studies in rodents and primates have clearly demonstrated that LC activation is more closely related to actions than to stimulus processing (see Poe et al, 2020 for review).

      Indeed during optimal performance the peaks of LC activity are larger when PETH are aligned to action initiation rather than the cue itself (Clayton et al., 2004). This alignment resolves variability in decision processing times and omitted cues. Although LC responses align with action they are evoked by, and occur after, cue presentation with LC responses to visual cues occurring ~ 60ms after presentation (Aston-Jones & Bloom, 1981). The same behavioral action without preceding task relevant cues does not evoke an LC response (Rajkowski et al., 2004)

      In our current study cues initiate activity in anterior M2, this is our primary interest and where our electrodes are placed. The window between cue delivery and action completion hones in on our goal of investigating the role for β noradrenergic signaling in target cortical processing, rather than LC explicitly. In both NHP and rodents NE signaling (and evoked LC) promotes sustained cortical representations between cue onset and actions across cortical regions (dlPFC, S1) (Ramos & Arnsten, 2007; Vazey et al., 2018; Wang et al., 2007). In the current study we aligned neural data to either cue presentation (Figure 3) or action (lever press; Figure 4). Both presentations support a critical role for β adrenoreceptor signaling in suppressing irrelevant information, resolving and maintaining action plans. A unique feature of aligning the data to cue onset is that it allows us to see how the neural activity changes not only on completed trials (that end with a lever press) but also on omitted trials (which strongly increase after propranolol). We propose the reason we are seeing large increases in omitted trials is because β adrenoreceptor blockade either directly or indirectly prevents anterior M2 from resolving an action plan.

      Second, the analysis of neural activity around cue onset should be examined with spikes aligned on the action, since M2 is a motor region and raster plots suggest that activity is strongly related to action (I'll be more specific below).

      We agree that M2 shows important action plan activity which we highlight throughout the manuscript. In cued tasks, M2 neurons have been shown to represent action plans starting at cue onset that continues up to behavioral execution. Neural data was examined and results presented aligned to cue onset (illustrated in Figure 3) and aligned to action - lever press (illustrated in Figure 4). The impact of propranolol in diminishing action plan selection was similar in both action, and cue-aligned analyses.

      The distinction between neural activity and behavior or cognition is not always clear. I understand that spike count can be related to motor preparation or decision, but it should not be taken for granted that neuronal activity is action planning. The analysis should be clarified and the relation between neural activity, behavior, and potential hidden cognitive operations should be explicated more clearly.

      We have worked to clarify in our revised introduction, results and discussion the specifics of the known roles of neural activity in M2 in both action planning and decision making. We further expand that the neuronal activity in our study may reflect potential changes in cognitive processing and thus alter resultant behavioral outcomes.

      The sex difference is interesting, but at the moment it seems anecdotal. From a theoretical point of view, is there any ecological/ biological reason for a sex dependency of noradrenergic modulation of the cortex? Is there any background literature on sex differences in motor functions in rats, or in terms of NA action? If not, why does it matter (how does it change the way we should interpret the data?) From a practical point of view, is there a functional sex difference in absence of treatment, or is it that the drug has a distinct effect on males vs females? This has very distinct consequences, I think.

      We did not find overt differences in behavior in the absence of treatment. Only when noradrenergic function was challenged using propranolol did we identify functional sex differences. We agree that this has very distinct consequences – specifically it supports sex differences that can be revealed by perturbations of normal function. These functional sex differences may be a result of differences in the anatomy of central noradrenergic systems, a hypothesis further supported by our mRNA expression findings and existing literature on LC anatomy across species (Bangasser et al., 2011, 2016; Luque et al., 1992; Mulvey et al., 2018; Ohm et al., 1997; Pinos et al., 2001). Collectively these results have potential ramifications for understanding sex differences in disease prevalence and targeted treatments.

      Background literature supports some innate sex differences in motor function and executive function in rodents and humans. Of particular relevance to our investigation is an established difference in behavioral strategy with females being more risk averse than males (Grissom & Reyes, 2019). Ethologically risk adverse strategies may support parental care roles, and increased inhibitory mechanisms may be selected for in females. Although this strategy was not directly tested in our study, the large increase in omissions after propranolol seen in females is in line with avoiding risk (incorrect choices) during uncertainty (disrupted neural signaling). As with other executive functions, the utilization of norepinephrine within the cortex along with other neuromodulators, and local microcircuit interactions would all contribute to promoting risk averse behavior.

      These issues could be clarified both in the introduction and in the discussion, but the authors might have a different view on what is theoretically relevant here. In the result section, however, I think that both the lack of specificity in the description of behavior and cognitive operation and the confusion between 'sensory' and 'motor' functions make it very difficult to figure out what is going on in these experiments, both at a behavioral and at a neurophysiological level. First, the description of the behavior in the task is clearly not sufficient, which makes the interpretation of the measures very difficult.

      We have made an effort to better specify the task and relevant behavioral operations in both the methods and results and have included a clearer task schematic (Figure 1A). We agree that the confusion between ‘sensory’ and ‘motor’ functions may make it more difficult to understand the findings in this study. Anterior M2 plays a unique role in representing motor/action plans that can be informed by sensory information. This integrative function creates difficulty in parsing the neural activity of anterior M2 as strictly motor, sensory or cognitive. In attempts to improve clarity we have expanded and highlighted relevant information on the known roles of M2 in the introduction and discussion.

      One possible interpretation of the effects of the drug is a decrease in motivation, for instance, due to a decrease in reward sensitivity or an increase in sensitivity to effort. But there are others. More importantly, none of these measures can be used to tease apart action preparation from action execution, even though the study is supposed to be about the former.

      Neural activity during action planning, prior to action execution is known to be an essential function of M2 (Barthas & Kwan, 2017; Gremel & Costa, 2013; Guo et al., 2017; Inagaki et al., 2018, 2022; Li et al., 2016; Siniscalchi et al., 2016; Sul et al., 2011; Wei et al., 2019) for optimal performance in 2AFC tasks. In all, we found that the representation/separation of opposing action plans (a well validated function of M2) prior to responses (lever press) is degraded after propranolol, especially in females. We have provided additional emphasis on these foundational studies throughout our revised manuscript.

      To minimize impact of motivational factors, effort and reward size remain consistent within our task, and all trials require a random initiation hold prior to cue delivery. As described in our general response to the editor above (Figure 1, above), we investigated whether motivational changes may be reflected in our M2 recordings. PETHs from the first and last 10 trials within saline sessions did not identify potential motivation related differences in anterior M2 activity. Similarly, across propranolol sessions the neural activity was consistent between early and late trials. We used early and late trials as there was a mild decrease in trial rate during saline sessions in both males and females, potentially indicative of motivation/reward sensitivity changes during these sessions. M2 neural responses consistently separate action plans (after saline) or failed to separate action plans (propranolol sessions).

      Also, but this is less critical: In Figures 2C and D, it looks like there is a bimodal distribution for the effect of propranolol in females. Is there something similar in the neuronal effects of the drug? And in the distribution of receptors? Can it be accounted for by hormonal cycles/ anything else?

      Although there is some clustering in behavioral outcomes all data passed normality assumption as appropriate. Propranolol treatments were not synchronized to hormonal cycles, and the data likely include animals at various hormonal stages. Similar clustering was not apparent in neuronal effects of propranolol, although propranolol increased variability in many measures.

      In a pilot experiment we did not see any difference in baseline performance on our 2AFC task across the hormonal cycle (diestrous, proestrous, estrous or metestrous) of females in any measure including accuracy (F(3,33)=0.59, p=0.63, one-way ANOVA) and omissions (F(3,33)=0.51, p=0.68).

      The description of neural activity is also very superficial. In general, it is not clear how spike count measures have been extracted. For example, legend and figure C are not clear, is the (long) period of cue presentation included in the 'decision time'?? "Cues were presented at a variable interval 200-700ms after initiation and until animals left the well, 'Well Exit'. The time from cue onset to well exit was identified as the decision time (yellow)." Yet on the figure only the period after cue presentation is in yellow. This is critical because, given the duration of the cue, the animals are probably capable of deciding (to exit the well) before the cue turns off. Indeed, as shown in fig 2D, the animals can decide within about 500 ms. So to what extent is the 'cue response' actually a 'decision response'?

      We have clarified the task and spike count measurements in methods and added a revised task schematic. It is correct that the cues are available throughout the decision time (for up to 5 seconds or until well exit), and an action plan is generated before well exit/cues turn off as reflected by the separation of neural action plans (Fig 3, saline). Anterior M2 neurons maintain action plan representation from cue onset until the lever press under normal conditions (Fig 4, saline). These action plans encapsulate “cue responses” and “decision responses”. We have aligned neural data to discrete timestamps at either end of the window in which M2 processing is known to be critical, specifically between cues and actions (lever press) and focus on neural activity relative to those points. We refer to this activity throughout the manuscript as an ‘action plan’ as action planning functions of M2 activity have been well established in prior studies.

      When looking at figure 3A, there is clearly a pattern on the raster, a line going from top left to bottom right. If the trials are sorted chronologically, something is happening over time. If, as I suspect, trials are sorted by ascending response time, this raster is showing that what authors are calling a 'response to cues' is actually a response around action. Basically, if propranolol slows down reaction time, the spikes will be delayed from cue onset only because they remain locked to the action. Then the whole analysis and interpretation need to be reconsidered. But it might be for the best: as I mentioned earlier, recent work on LC activity has clearly emphasized its influence on motor rather than sensory processing (Poe et al, 2020).

      Figure 3A is a single neuron example, and data analyses focus on population-wide activity. Neural data is presented both aligned to cues, for all trials in which a cue was received, and aligned to lever press (action), for all trials on which a lever press occurred. In both cases, aligned to cue or aligned to action, the impact of propranolol is the same. β adrenoreceptor blockade reduces the separation of action plans in M2, severely so in females. However, a major finding is that females receive a cue but omit a large number of trials after propranolol, for this outcome the action does not occur. We propose this is due to the lack of action plan separation in anterior M2 (either directly or indirectly). When no behavioral response occurs, these trials cannot be aligned to action, yet we are still interested in the neural activity during the critical window between cue delivery and actions. We are not assigning this neural activity to sensory processing but using this discrete sensory event within our trials (cue) to align the data as there is substantial evidence that action plans in M2 arise after cue presentation in tasks such as ours where performance is guided by external cues.

      Fig 2D-F: it is hard to believe that the increase in firing rate induced by propranolol in females is not significant. Presumably, because the range of the median firing rate is so high in the first place, distribution (2E) really indicates an increase in firing. Maybe some other test? e.g paired t.test, or standardized values (z.score) to get rid of variability in firing across neurons?

      We agree that the session wide firing rate appears rightward shifted in females after propranolol. As our recordings were taken on different days, several days apart we cannot assume they are the same neurons for paired analyses. In our revised manuscript we evaluated these distributions using a MannWhitney test to increase power and decrease the impact of variability within the population. Previously we had used a Kolmogorov-Smirnov test. Using our new analysis, we can confirm that the propranolol significantly increases session wide firing rates in anterior M2 of females (p=0.027) but not males. This finding increases evidence for direct actions of propranolol within M2 and supports our hypothesis that propranolol leads to local disinhibition by reducing β noradrenergic signaling in interneurons and that without this noradrenergic tone anterior M2 is less efficient at suppressing irrelevant action plans.

      Along those lines, would it be worth looking for effects on specific populations (interneurons) which are sometimes characterized by thinner spikes and higher mean firing rates? Given the distribution of beta receptors RNA on interneurons, one would actually expect an effect of propranolol on the firing rate irrespective of task events. Or what is it that prevents the influence of propranolol on interneurons from changing the firing rate? In any case, one of the strengths of this study is the localization of beta receptors on specific neuronal populations in the cortex, so I think that the authors should really try to build on it and find something related to the neurophysiological effects. Otherwise, one cannot exclude the possibility that the behavioral effects are not related to the influence of the drug on these receptors in that region.

      Data were collected using stainless steel electrode arrays and our sample population of task related neurons is likely biased to pyramidal neurons, with a small number of fast spiking interneurons. We used validated spike waveform parameters of interneurons in premotor cortex (peak-to-trough ratio and duration; Giordano et al., 2023) in an attempt to isolate putative interneurons and found only a very small number of these cells in our recordings (n=5-7 per group). This population is too small to make any inferences about specific impacts. We have focused on the collective population activity of M2 as this is most strongly related to optimal action planning.

      You are correct that from the given findings we cannot conclusively show that the results found here are a result of propranolol acting solely within anterior M2. We have made sure to clarify throughout our revised manuscript that the behavioral and physiological changes we identified are a result of collective direct and indirect actions of propranolol.

      The conclusion that neuronal discrimination decreases because the proportion of neurons showing no effect increases is confusing (negative results, basically). It would be clearer if they were reporting the number of neurons that do show an effect, and presumably that this number shows a significant decrease.

      The reviewer is correct that the number of neurons that do show an effect (task related activity) does significantly decrease with propranolol (from n=70 to 27 in females and n=71 to 48 in males). These n are now given adjacent to the proportions rather than at the end of the paragraph. Proportions were used for statistical analysis due to an overall decrease in the total number of units after propranolol. All PETH presented are from neurons that show some task related activity, these PETH confirm that neural activity no longer effectively discriminates/separates action plans in M2.

      Figs 3F-I: a good proportion of neurons (at least 20%) show a significant encoding before cue onset. How is it possible? This raises the issue of noise level/ null hypothesis for this kind of repeated analysis. How did the author correct for multiple comparison issues?

      In response to reviews, we have altered the manner in which we identify the significantly modulated neurons to increase rigor and no longer include these figures or analyses. The proportion of neurons showing action plan encoding prior to cue onset was likely an artifact of how the data was analyzed and an insufficient correction for multiple comparisons, allowing inclusion of internally generated action plans in some neurons.

      The description of the action-related activity is globally confusing. Again, how can the authors discriminate between activity related to planning vs action itself? What is significant and what is not, in males vs females? What is being measured here? For example, a very unclear statement on line 238: "Propranolol primarily disrupted active inhibition of irrelevant action selection in M2 activity, reducing the ability to maintain action plan representation in M2, delaying lever press responses (Figure 4L, 4M)." What is 'active inhibition? What is an irrelevant action plan? What is selection? All of that should be defined using objective behavioral criteria and tested formally.

      We have changed our wording to clarify what we are describing and why we have chosen the words we have, and to ensure consistency and objectivity throughout the manuscript. Much of the wording we have used – for example action planning or action plan selection, are the words used in the literature to describe M2 neural activity. We call the activity in M2 action planning (either externally/cue guided or internally guided) because that is what has been previously demonstrated. In our task design and analysis we are tracking cue guided actions, as opposed to internally guided.

      We also separate the electrophysiology data as preferred and nonpreferred because the literature has shown individual M2 neurons show specific directional tuning as noted in our results, using the term ‘preferred’ encapsulates that tuning regardless of left/right direction. An example M2 neuron that increases activity for left cues and responses (preferred direction), will show active inhibition (low/negative z scores) on trials with right cues and responses (nonpreferred), other neurons would show the inverse relationship with direction.

      A primary impact of propranolol was the loss of negative z-scores for nonpreferred trials ie neurons with a left preference that are usually inhibited on right trials were still firing and vice-versa. After propranolol neurons continue to fire for an irrelevant action plan (for the opposite direction), and the resulting population activity is not significantly different for opposing cues/responses. Behavioral responses normally occur after opposing action plans have significantly separated in M2, collapsing action plans by preventing relevant signaling (Guo et al., 2017; Inagaki et al., 2018; Li et al., 2016) or facilitating irrelevant signaling as we see here with propranolol leads impairments in 2AFC performance.

      Also, the description of the classifier analysis should be more thorough. Referencing the toolbox is not sufficient to understand what has been done.

      We have added additional explanation in both the methods and description of the results to clarify the functions of the neural decoding box and how we are using it to evaluate information encoding within M2. We have provided detail on how the algorithm was trained, how shuffled data was generated and how we determined significance of decoding accuracy.

      Measuring Beta adrenoceptors is a great idea, and the results are interesting, especially the difference between neuron types. But again, how does that fit with neurophysiological results? Note, that since this is RNA measures, it should not be phrased as 'receptors' but 'receptors RNA' throughout. One possible interpretation of these anatomical results that cannot be reconciled with physiology is that protein expression at the membrane shows a distinct pattern.

      We have changed the references to β receptor expression to β receptor mRNA expression throughout the manuscript. Although mRNA provides a valuable proxy for adrenoreceptor production, as noted by the reviewer protein expression at the membrane may differ. Reliable antibodies that allow quantitative analysis of membrane bound adrenoreceoptors in situ with co-labeling of specific cell types are limited. The goal of assessing mRNA expression within M2 was to determine if the functional sex differences we identified in M2 neurophysiology when manipulating β adrenoreceptor function could be mediated by basal differences in adrenoreceptors. The causal impact of differential mRNA expression in anterior M2 was not directly tested but our findings provide preliminary evidence that adrenoreceptor regulation may differ across sexes. Our results provide a plausible avenue for differential sensitivity to β adrenoreceptor manipulation across sexes, that may also be found in other brain regions.

      In conclusion, I think that this is a very interesting study and that the results are potentially relevant for a wide audience. But the paper would clearly benefit from revisions. If the authors could clearly identify a significant relationship between the action of NA on beta receptors on specific cortical neurons, at a physiological and behavioral level, that would be a seminal study. At the moment, the evidence is not convincing enough but the data suggest that it is the case.

      We thank the reviewer for the kind remarks. We have undertaken a number of new analyses, refined existing analysis and clarified our claims in the manuscript to improve rigor. Collectively our data reflect that the behavioral and neural deficits after systemic propranolol are likely due to both direct and indirect actions on M2. We believe this work is compelling and that it will inform future work investigating potential sex differences in central noradrenergic anatomy and functional sex differences after perturbations of noradrenergic signaling.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) What's the rationale of trypsinizing the tissue prior to mitochondrial isolation? This is not standard for subsequent proteomics analysis. This step will inevitably cause protein loss, especially for the post mitochondrial fractions (PMF). Treating samples with 0.01ug/uL trypsin for 37oC 30 min is sufficient to partially digest a substantial portion of the proteome. If samples from different subjects were not of the same weight, then this partial digestion step may introduce artificial variability as variable proportions of proteins from different subjects would be lost during this step. In addition, the mitochondrial protein enrichment in the mito fraction, despite statistically significant, does not look striking (Figure 1E, ~30% mitochondrial proteins in the mito fraction). As a comparison, Williams et al., MCP 2018 seem to have obtained high mitochondrial protein content in the mito fraction without trpsinizing the frozen quadriceps using a similar SWATH-MS-based approach.

      Trypsinisation of the tissue prior to mitochondrial isolation is based on previous work and a Nature Protocol (1, 2) which isolated mitochondria for skeletal muscle. The rationale is that it aids in mechanical homogenisation from highly fibrous tissues such as quadriceps muscle by digesting extracellular matrix proteins. The trypsin/protein ratio used to aid in this process is at least 400 times lower than the amount of trypsin used for formal proteomic tryptic digestion. Three pieces of evidence suggest this step has negligible effect on downstream proteomic analysis. First, because the trypsinisation buffer is detergent free, trypsin will only affect extracellular or exposed membrane proteins. Filtering our PMF dataset for proteins with ‘extracellular matrix’ gene ontology identifies at least 90 unique extracellular matrix proteins indicating good retention of proteins susceptible to partial digestion. Second, the trypsin dose used is 50 times lower than the concentration used for passaging cultured cells, which retain viability after trypsinisation. Third, and contrary to the point raised by the reviewer, we observe less missingness in PMF samples compared to mitochondrial samples. We thank the reviewer for bringing the Williams et al. 2018 MCP paper to our attention. We note that mitochondrial enrichment between the two papers is comparable (~2- fold). To improve clarity line 408 now reads: “Whole quadriceps muscle samples were prepared as previously described with modification (99, 100). First, tissue was snap frozen with liquid nitrogen…” and line 95 reads: “Mitochondrial proteins were defined based on their presence in MitoCarta 3.0 (24) and consistent with previous work (25) were approximately two-fold enriched in the mitochondrial fraction relative to the PMF (Fig 1E).”

      (2) The authors mentioned that the proteomics data were Log2 transformed and median- normalized. Would it be possible to provide a bit more details on this? Were the subjects randomized?

      Samples were randomised prior to sample processing and mass spectrometry analysis. Because of possible variation in total protein content, it is critical to normalise protein intensities between samples. Median normalisation adjusts the samples so that they have the same median, thereby accounting for technical variation. Log2 normalisation helps to achieve normal distributions, critical for many downstream statistical tests. Line 471 now reads: “…to achieve normal distributions and account for technical variation in total protein.”

      (3) In Figure 1D, what were the numbers of mice the authors used for the CV comparisons in each group? Were they of similar age and sex? Were the differences in CV values statistically significant?

      The mitochondrial and PMF proteomes originated from the same quadriceps sample from the same mouse, and thus the age and sex are the same across both proteomes. After quality control, we had mitochondrial proteomes for 194 mice and PMF proteomes for 215 mice. The overall CV in the mitochondrial fraction was significantly greater than in the PMF, however whether the source of this variation is biological, or the result of mitochondrial isolation is unclear and as such we have avoided making a statement within the body of the manuscript. We have now more clearly described the nature of the samples in the revised manuscript and added sample sizes to figure 1F.

      (4) The authors stated in lines 155-157 that proteins negatively associated with the Matsuda index were further filtered by presence of their cis-pQTLs. Perhaps more explanations would be needed to justify this filtering criterion? Having a cis-pQTL would mean the protein abundance variation is explained by the variation in its coding gene, this however conceptually would not be relevant to its association with the Matsuda index. With the data that the authors have in hand, would it not be natural to align the Matsuda index QTL with the pQTLs (cis and trans if available), and/or to perform mediation analysis to examine causal relationships with statistical significance?

      The rationale for filtering by cis-pQTL was not to study the genetics of either Matsuda or associated proteins but rather to identify proteins that were more likely to be causally associated with Matsuda Index as opposed to adaptively associated. To clarify this line 165 now reads: “Filtering based on cis-pQTL presence was based on the rationale that if genetic variation can explain protein abundance differences between mice, then we can be confident that phenotype (Matsuda Index) is not driving the observed differences and therefore the protein-phenotype associations are likely causal. Importantly, this assumption can only be made for cis-acting pQTLs.” Previous work by Matthew et al. (see https://qtlviewer.jax.org/) has demonstrated that cis-pQTL have markedly higher LOD scores than trans-pQTLs, and our own unpublished work suggests that trans-pQTLs do not reproduce well between datasets. The reviewer rightfully suggests aligning protein QTL with those for Matsuda. This is our long-term goal but to identify genome wide significant peaks associated with altered Matsuda will require many more mice than studied here.

      (5) It seems a bit odd that the first half of the paper focused extensively on the authors' discoveries in the mitochondrial proteome, and how proteins involved in mitochondrial processes (such as complex I) were associated with Matsuda Index, but the final fingerprint list of insulin resistance, which contained 76 proteins, only had 7 mitochondrial proteins. Was this because many mitochondrial proteins were filtered out due to no cis-pQTL presenting?

      There are three reasons our fingerprint is lacking mitochondrial proteins: 1) there are more non-mitochondrial than mitochondrial proteins in the muscle proteome; 2) we focussed on negatively associated proteins, and as demonstrated in figure 2c, the mitochondrial proteome is enriched for positively associated proteins; 3) as implied by the reviewer, we filtered for pQTL presence, further reducing the number of mitochondrial proteins in our fingerprint. To improve clarity, line 170 now reads: “Low mitochondrial representation in the fingerprint is the result of selecting negatively associating proteins, and as seen (Figure 2C) previously, the mitochondrial proteome is enriched for positive contributors to insulin resistance.”

      (6) The authors found that thiostrepton-induced insulin resistance reversal effects were not through insulin signalling. It activated glycolysis but the mechanism of action was not clear. What are the proteins in the fingerprint list that led to identification of thiostrepton on CMAP?

      Is thiostrepton able to bind or change the expression of these proteins? Since thiostrepton was identified by searching the insulin resistance fingerprint protein list against CMAP, it would be rational to think that it exerts the biological effects by directly or indirectly acting on these protein targets.

      This is indeed the implication of our data. Because of the timescales involved it is unlikely that thiostrepton is changing fingerprint protein levels but could be binding to and inhibiting them. Searching the CMAP thiostrepton signature reveals ARHGDIB and NAGK as the fingerprint proteins with the most positive and negative fold-changes respectively perhaps suggesting they play a role in thiostrepton’s mechanism of action. Experiments are underway to test this hypothesis however these are beyond the scope of the current paper.

      Reviewer #2 (Public Review):

      Line 105: The observation that variance in respiratory proteins is stable while lipid pathways is variable is quite interesting. Is this due to lower overall levels of lipid metabolism enzymes (ex. do these differ substantially from similar pathways ranked from high-low abundance?).

      The relationship between coefficient of variation (CV) and relative abundance of proteins is important to consider. To address this, we have now also performed GSEA on proteins ranked from high to low relative abundance. These comparisons have been added to supplementary figure 1 and line 110 now reads: “As a control experiment, we also performed enrichment analysis on proteins ranked by LFQ relative abundance. High CV pathways (enriched for high CV proteins) tended to be lower in relative abundance (enriched for low relative abundance proteins) (Supplementary Fig 1a, b). However, many high variability pathways, lipid metabolism for example, were not enriched in either direction based on relative abundance suggesting differences in relative abundance do not fully explain pathway variability differences.”

      Line 154: the 664 associations are impressive and potentially informative. It would be valuable to know which of these co-map to the same locus - either to distinguish linkage in a 2mb window or identify any cis-proteins which directly exert effects in trans-

      To assess this, we have analysed pQTL position relative to gene position to generate a ‘hotspot’ plot. We have also generated a histogram of this pQTL density (in a 2 Mbp window) and added these figures to figure 3. We did not detect any obvious pQTL hotspots, and the distribution of pQTLs across the genome appears fairly uniform. Line 159 now reads: “These were distributed across the genome and were predominately cis acting (Figure 3A)...”

      Line 194: Cross-platform validation of the CMAP fingerprint results is an admirable set of validations. It might be good to know general parameters like how many compounds were shared/unique for each platform. Also the concordance between ranking scores for significant and shared compounds.

      The Connectivity Map (CMap) query included 5163 compounds, the Prestwick library included 1120, and the overlap was 420. We have added these comparisons to supplementary figure 2. Supplementary figure 2 now also contains a comparison of CMap scores between overlapping compounds (found in CMap and the Prestwick library) against all significant compounds identified by CMap (supplementary figure 2b). Interestingly, compounds present in both platforms scored higher on average, suggesting the Prestwick library captures a significant proportion of highly scoring CMap candidates. Line 206 now reads: “In total, 420 compounds were found across both platforms, and these consensus compounds captured a significant proportion of highly scoring CMap compounds (Supplementary Figure 2A, B).”

      Line 319: Another consideration in the molecular fingerprint is how unique these are for muscle. While studies evaluating gene expression have shown that many cis-eQTLs are shared across tissues, to my knowledge, this hasn't been performed systematically for pQTLs. Therefore, consider adding a point to the discussion pointing out that some of the proteins might be conserved pQTLs whereas others which would be more relevant here present unique druggable targets in muscle.

      To examine tissue specificity, we determined whether our skeletal muscle fingerprint proteins were detected and contained a pQTL in two metabolically important tissues, liver and adipose. Despite detecting almost all the fingerprint proteins in both adipose and liver tissue, they were depleted for pQTL compared to skeletal muscle. These data have now been added to figure 3c. Line 172 now reads: “To assess the tissue specificity of our fingerprint we searched for the same proteins in metabolically important adipose and liver tissues. Despite detecting 94% and 82% of muscle fingerprint proteins across each tissue respectively, both adipose and liver were depleted for pQTL presence (Figure 3C) suggesting that regulation of our fingerprint protein abundance is specific to skeletal muscle.”

      Line 332: These are fascinating observations. 1, that in general insulin signaling and ampk were not themselves shown as top-ranked enrichments with matsuda and that this was sufficient to alter glucose metabolism without changes in these pathways. While further characterization of this signaling mechanism is beyond the scope of this study, it would be good to speculate as to additional signaling pathways that are relevant beyond ROS (ex. CNYP2 and others)

      We have now added further discussion to the manuscript to address this point., Line 347 now reads: “Aside from glycolysis, other pathways may be involved in enhancing insulin sensitivity. For example, the negatively associated protein ARHGDIA (Figure 2F) is a potent negative regulator of insulin sensitivity, and our fingerprint of insulin resistance contained its homologue ARHGDIB. Both ARHGDIA and ARHGDIB have been reported to inhibit the insulin action regulator RAC1 thus lowering GLUT4 translocation and glucose uptake. Further investigations may uncover a role for thiostrepton in modulating the RAC1 signalling pathway via ARHGDIB.”

      Line: 314: Remove the statement: "While this approach is less powerful than QTL co- localisation for identifying causal drivers,", as I don't believe that this has been demonstrated. Clearly, the authors provide a sufficient framework to pinpoint causality and produce an actionable set of proteins.

      We have edited line 314, which now reads: “Moreover, our approach has the major advantage that it requires far fewer mice to obtain meaningful outcomes (222 mice in this study) compared to that required for genetic mapping of complex traits like Matsuda Index.”

      Line 346: I would highlight one more appeal of the approach adopted by the authors. Given that these compound libraries were prioritized from patterns of diverse genetics, these observations are inherently more-likely to operate robustly across target backgrounds.

      This point is further supported by our thiostrepton results in both C57BL6/j and BXH9 mice. Line 317 now reads: “Furthermore, because we have used genetically diverse datasets (DOz mice and multiple cell lines in Connectivity Map) our findings are likely robust across diverse target backgrounds.”

      Line 434: I might have missed but can't seem to find where the muscle data are available to researchers. Given the importance and novelty of these studies, it will be important to provide some way to access the proteomic data.

      These data are now available via the ProteomeXchange Consortium. Line 465 now reads: “The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (104) partner repository with the dataset identifier PXD042277.”

      1. Frezza C, Cipolat S, Scorrano L. Organelle isolation: functional mitochondria from mouse liver, muscle and cultured filroblasts. Nat Protoc. 2007;2(2):287-95.

      2. Acin-Perez R, Benador IY, Petcherski A, Veliova M, Benavides GA, Lagarrigue S, et al. A novel approach to measure mitochondrial respiration in frozen biological samples. The EMBO Journal. 2020;39(13):e104073.

      3. Chick JM, Munger SC, Simecek P, Huttlin EL, Choi K, Gatti DM, et al. Defining the consequences of genetic variation on a proteome-wide scale. Nature. 2016;534(7608):500- 5.

      4. Gatti DM, Svenson KL, Shabalin A, Wu L-Y, Valdar W, Simecek P, et al. Quantitative Trait Locus Mapping Methods for Diversity Outbred Mice. G3 Genes|Genomes|Genetics. 2014;4(9):1623-33.

    1. Reviewer #2 (Public Review):

      Accumulating data suggests that the presence of immune cell infiltrates in the meninges of the multiple sclerosis brain contributes to the tissue damage in the underlying cortical grey matter by the release of inflammatory and cytotoxic factors that diffuse into the brain parenchyma. However, little is known about the identity and direct and indirect effects of these mediators at a molecular level. This study addresses the vital link between an adaptive immune response in the CSF space and the molecular mechanisms of tissue damage that drive clinical progression. In this short report the authors use a spatial transcriptomics approach using Visium Gene Expression technology from 10x Genomics, to identify gene expression signatures in the meninges and the underlying brain parenchyma, and their interrelationship, in the PLP-induced EAE model of MS in the SJL mouse. MRI imaging using a high field strength (11.7T) scanner was used to identify areas of meningeal infiltration for further study. They report, as might be expected, the upregulation of genes associated with the complement cascade, immune cell infiltration, antigen presentation, and astrocyte activation. Pathway analysis revealed the presence of TNF, JAK-STAT and NFkB signaling, amongst others, close to sites of meningeal inflammation in the EAE animals, although the spatial resolution is insufficient to indicate whether this is in the meninges, grey matter, or both.

      UMAP clustering illuminated a major distinct cluster of upregulated genes in the meninges and smaller clusters associated with the grey matter parenchyma underlying the infiltrates. The meningeal cluster contained genes associated with immune cell functions and interactions, cytokine production, and action. The parenchymal clusters included genes and pathways related to glial activation, but also adaptive/B-cell mediated immunity and antigen presentation. This again suggests a technical inability to resolve fully between the compartments as immune cells do not penetrate the pial surface in this model or in MS. Finally, a trajectory analysis based on distance from the meningeal gene cluster successfully demonstrated descending and ascending gradients of gene expression, in particular a decline in pathway enrichment for immune processes with distance from the meninges.

      Although these results confirm what we already know about processes involved in the meninges in MS and its models and gradients of pathology in sub-pial regions, this is the first to use spatial transcriptomics to demonstrate such gradients at a molecular level in an animal model that demonstrates lymphoid like tissue development in the meninges and associated grey matter pathology. The mouse EAE model being used here does reproduce many, although not all, of the pathological features of MS and the ability to look at longer time points has been exploited well. However, this particular spatial transcriptomics technique cannot resolve at a cellular level and therefore there is a lot of overlap between gene expression signatures in the meninges and the underlying grey matter parenchyma.

      The short nature of this report means that the results are presented and discussed in a vague way, without enough molecular detail to reveal much information about molecular pathogenetic mechanisms.

      The trajectory analysis is a good way to explore gradients within the tissues and the authors are to be applauded for using this approach. However, the trajectory analysis does not tell us much if you only choose 2 genes that you think might be involved in the pathogenetic processes going on in the grey matter. It might be more useful to choose some genes involved in pathogenetic processes that we already know are involved in the tissue damage in the underlying grey matter in MS, for which there is already a lot of literature, or genes that respond to molecules we know are increased in MS CSF, although the animal models may be very different. Why were C3 and B2m chosen here?

      Strengths:<br /> - The mouse model does exhibit many of the features of the compartmentalized immune response seen in MS, including the presence of meningeal immune cell infiltrates in the central sulcus and over the surface of the cortex, with the presence of FDC's HEVs PNAd+ vessels and CXCL13 expression, indicating the formation of lymphoid like cell aggregates. In addition, disruption of the glia limitans is seen, as in MS. Increased microglial reactivity is also present at the pial surface.<br /> - Spatial transcriptomics is the best approach to studying gradients in gene expression in both white matter and grey matter and their relationship between compartments.<br /> - It would be useful to have more discussion of how the upregulated pathways in the two compartments fit with what we know about the cellular changes occurring in both, for which presumably there is prior information from the group's previous publications.

      Limitations:<br /> - EAE in the mouse is not MS and may be far removed when one considers molecular mechanisms, especially as MS is not a simple anti-myelin protein autoimmune condition. Therefore, this study could be following gene trajectories that do not exist in MS. This needs a significant amount of discussion in the manuscript if the authors suggest that it is mimicking MS.<br /> - The model does not have the cortical subpial demyelination typical of MS and it is unknown whether neuronal loss occurs in this model, which is the main feature of cytokine-mediated neurodegeneration in MS. If it does not then a whole set of genes will be missing that are involved in the neuronal response to inflammatory stimuli that may be cytotoxic.<br /> - Visium technology does not get down to single cell level and does not appear to allow resolution of the border between the meninges and the underlying grey matter.<br /> - Neuronal loss in the MS cortex is independent of demyelination and therefore not related to remyelination failure. There does not appear to be any cortical grey matter demyelination in these animals, so it is difficult to relate any of the gene changes seen here to demyelination.<br /> - No mention of how the ascending and descending patterns of gene expression may be due to the gradient of microglial activation that underlies meningeal inflammation, which is a big omission.

    2. Author Response:

      We thank Reviewer #1 for their positive assessment of our work.

      Reviewer #2 (Public Review):

      […] Although these results confirm what we already know about processes involved in the meninges in MS and its models and gradients of pathology in sub-pial regions, this is the first to use spatial transcriptomics to demonstrate such gradients at a molecular level in an animal model that demonstrates lymphoid like tissue development in the meninges and associated grey matter pathology. The mouse EAE model being used here does reproduce many, although not all, of the pathological features of MS and the ability to look at longer time points has been exploited well. However, this particular spatial transcriptomics technique cannot resolve at a cellular level and therefore there is a lot of overlap between gene expression signatures in the meninges and the underlying grey matter parenchyma.

      We appreciate the reviewer’s concise summary and comments on our manuscript. We agree that the Visium spatial sequencing technology we applied is limited in its resolution and cannot precisely distinguish individual cells or anatomic regions. For that reason, there is undoubtedly some overlap between gene expression signatures in the meninges and underlying parenchyma, particularly in spots on the borders of the meningeal inflammation clusters. However, we believe that the majority of meningeal inflammation (“cluster 11”) spots are indeed in the meninges and represent the spatial transcriptome of that niche. To support this, in the revised manuscript we will provide H&E images with the UMAP clusters overlayed to demonstrate the anatomic borders that correlate with the clusters.

      The short nature of this report means that the results are presented and discussed in a vague way, without enough molecular detail to reveal much information about molecular pathogenetic mechanisms.

      We thank the reviewer for this comment. The goal of this work is to transcriptomically characterize the spatial relationship between areas of meningeal inflammation and the underlying parenchyma. While we agree that mechanistic studies are needed to further evaluate the role of presented signaling pathways, those experiments are beyond the scope of this brief report.

      The trajectory analysis is a good way to explore gradients within the tissues and the authors are to be applauded for using this approach. However, the trajectory analysis does not tell us much if you only choose 2 genes that you think might be involved in the pathogenetic processes going on in the grey matter. It might be more useful to choose some genes involved in pathogenetic processes that we already know are involved in the tissue damage in the underlying grey matter in MS, for which there is already a lot of literature, or genes that respond to molecules we know are increased in MS CSF, although the animal models may be very different. Why were C3 and B2m chosen here?

      We appreciate the reviewer’s points here. C3 and B2m were chosen as examples of genes that have differential fit to the gradient descending pattern to assist the reader in interpreting subsequent gene set trajectory analysis. However, we agree that there are many other genes of interest and will expand the number of genes displayed in our revised manuscript. 

      Strengths: <br /> - The mouse model does exhibit many of the features of the compartmentalized immune response seen in MS, including the presence of meningeal immune cell infiltrates in the central sulcus and over the surface of the cortex, with the presence of FDC's HEVs PNAd+ vessels and CXCL13 expression, indicating the formation of lymphoid like cell aggregates. In addition, disruption of the glia limitans is seen, as in MS. Increased microglial reactivity is also present at the pial surface. <br /> - Spatial transcriptomics is the best approach to studying gradients in gene expression in both white matter and grey matter and their relationship between compartments. <br /> - It would be useful to have more discussion of how the upregulated pathways in the two .compartments fit with what we know about the cellular changes occurring in both, for which presumably there is prior information from the group's previous publications.

      Limitations: <br /> - EAE in the mouse is not MS and may be far removed when one considers molecular mechanisms, especially as MS is not a simple anti-myelin protein autoimmune condition. Therefore, this study could be following gene trajectories that do not exist in MS. This needs a significant amount of discussion in the manuscript if the authors suggest that it is mimicking MS. <br /> - The model does not have the cortical subpial demyelination typical of MS and it is unknown whether neuronal loss occurs in this model, which is the main feature of cytokine-mediated neurodegeneration in MS. If it does not then a whole set of genes will be missing that are involved in the neuronal response to inflammatory stimuli that may be cytotoxic. <br /> - Visium technology does not get down to single cell level and does not appear to allow resolution of the border between the meninges and the underlying grey matter. <br /> - Neuronal loss in the MS cortex is independent of demyelination and therefore not related to remyelination failure. There does not appear to be any cortical grey matter demyelination in these animals, so it is difficult to relate any of the gene changes seen here to demyelination. <br /> - No mention of how the ascending and descending patterns of gene expression may be due to the gradient of microglial activation that underlies meningeal inflammation, which is a big omission.

      We thank the reviewer for their insightful comments on the strengths and limitations of our study. Regarding the SJL EAE model we use in this paper, it certainly is not a perfect model of meningeal inflammation in MS, indeed we believe that no such animal model exists, but it does recapitulate several key features of human disease as described by the reviewer. Spatial transcriptomics of cortical grey matter lesions and overlying meninges of samples derived from patients with MS would be ideal, though access to this tissue is highly limited. In the revised manuscript we will include more detailed discussion of the limitations in applying these findings to MS. However, in addition to potential implications for MS research, our data contribute more generally to understanding of meningeal inflammation and penetrance of inflammation into brain tissue.

      We acknowledge that sub-pial neuronal loss has not been assessed in SJL EAE, and if present it would increase the relevance of this model to neurodegeneration. We are currently working to assess this.

      We agree with the reviewer that Visium technology is limited in its ability to discriminate individual cells, as discussed above (2.2).

      We agree that gene expression by activated microglia is likely a major driver of the transcriptomic changes observed in the parenchyma, and thank the reviewer for highlighting this. We will add discussion of this to our revised manuscript, and intend to generate additional data regarding the contribution of subpial microglial activation to the measured transcriptomic changes.

      Finally, we thank Reviewer #3 for their assessment of our work.

    1. Author Response

      eLife assessment:

      Trypanosoma brucei evades mammalian humoral immunity through the expression of different variant surface glycoprotein genes. In this fundamental paper, the authors extend previous observations that TbRAP1 both interacts with PIP5pase and binds PI(3,4,5)P3, indicating a role for PI(3,4,5)P3 binding and suggesting that antigen switching is signal dependent. While much of the evidence is compelling, one reviewer suggested that the work would benefit from further controls.

      We appreciate the evaluation of the work and agree that the findings substantially advance our understanding of antigenic variation. A detailed response to the public review is included below, which addresses and clarifies the issues raised by the reviewers, including those concerning controls. We also want to highlight the comment by Reviewer #3 “The methods used in the study are rigorous and well-controlled…. their results support the conclusions made in the manuscript.”. We hope this and our comments will help address the issue of controls in this eLife statement.

      Reviewer #1 (Public Review):

      Trypanosoma brucei undergoes antigenic variation to evade the mammalian host’s immune response. To achieve this, T. brucei regularly expresses different VSGs as its major surface antigen. VSG expression sites are exclusively subtelomeric, and VSG transcription by RNA polymerase I is strictly monoallelic. It has been shown that T. brucei RAP1, a telomeric protein, and the phosphoinositol pathway are essential for VSG monoallelic expression. In previous studies, Cestari et al. (ref. 24) have shown that PIP5pase interacts with RAP1 and that RAP1 binds PI(3,4,5)P3. RNAseq and ChIPseq analyses have been performed previously in PIP5pase conditional knockout cells, too (ref. 24). In the current study, Touray et al. did similar analyses except that catalytic dead PIP5pase mutant was used and the DNA and PI(3,4,5)P3 binding activities of RAP1 fragments were examined. Specifically, the authors examined the transcriptome profile and did RAP1 ChIPseq in PIP5pase catalytic dead mutant. The authors also expressed several C-terminal His6-tagged RAP1 recombinant proteins (full-length, aa1-300, aa301-560, and aa 561-855). These fragments’ DNA binding activities were examined by EMSA analysis and their phosphoinositides binding activities were examined by affinity pulldown of biotin-conjugated phosphoinositides. As a result, the authors confirmed that VSG silencing (both BES-linked and MES-linked VSGs) depends on PIP5pase catalytic activity, but the overall knowledge improvement is incremental. The most convincing data come from the phosphoinositide binding assay as it clearly shows that N-terminus of RAP1 binds PI(3,4,5)P3 but not PI(4,5)P2, although this is only assayed in vitro, while the in vivo binding of full-length RAP1 to PI(3,4,5)P3 has been previously published by Cestari et al (ref. 24) already. Considering that many phosphoinositides exert their regulatory role by modulating the subcellular localization of their bound proteins, it is reasonable to hypothesize that binding to PI(3,4,5)P3 can remove RAP1 from the chromatin. However, no convincing data have been shown to support the author’s hypothesis that this regulation is through an “allosteric switch”. Therefore, the title should be revised.

      We appreciate the reviewer’s detailed evaluation of our work. There are a few general comments that we would like to clarify. We will break them into three points. All data included here are new and were not previously published.

      i) “RNAseq and ChIPseq analyses have been performed previously …(ref. 24).” Reference 24 is Cestari et al. 2019, Mol Cell Biol. We, or others, have not published ChIP-seq of RAP1 in T. brucei. Previous work showed ChIP-qPCR, which analyses specific loci. The ChIP-seq shows genome-wide binding sites of RAP1, and new findings are shown here, including binding sites in the BES, MESs, and other genome loci such as centromeres. We also identified DNA sequence bias defining RAP1 binding sites (Fig 2A). We also show by ChIP-seq how RAP1-binding to these loci changes upon expression of catalytic inactive PIP5Pase. As for the RNA-seq, this is also the first time we show RNA-seq of T. brucei expressing catalytic inactive PIP5Pase, which establishes that the regulation of VSG silencing and switching is dependent on PIP5Pase enzyme catalysis, i.e., PI(3,4,5)P3 dephosphorylation. To improve clarity in the manuscript, we edited page 4, line 122, as follows: “We showed that RAP1 binds telomeric or 70 bp repeats (24), but it is unknown if it binds to other ES sequences or genomic loci.”

      ii) “The in vivo binding of full-length RAP1 to PI(3,4,5)P3 has been previously published by Cestari et al. (ref. 24) already.”. We published in reference 24 that RAP1-HA can bind agarose beads-conjugated synthetic PI(3,4,5)P3. Here, we were able to measure T. brucei endogenous PI(3,4,5)P3 associated with RAP1-HA (Fig 4F). Moreover, we showed that the endogenous RAP1-HA and PI(3,4,5)P3 binding is about 100-fold higher when PIP5Pase is catalytic inactive than WT PIP5Pase. The data establish that in vivo endogenous PI(3,4,5)P3 binds to RAP1-HA and how the binding changes in cells expressing mutant PIP5Pase; this data is new and relevant to our conclusions.

      iii) “no convincing data have been shown to support the author’s hypothesis that this regulation is through an “allosteric switch””. We show here in vitro and in vivo data supporting the conclusion. We show that PI(3,4,5)P3 binds to the N-terminus of rRAP1-His with a calculated Kd of about 20 µM (Fig 4B-E, Table 1). In contrast, we show by EMSA and binding kinetics by microscale thermophoresis that rRAP1-His binds to 70 bp and telomeric repeats via protein regions encompassing the Myb (central) or Myb-L domains (C-terminal) but not the N-terminus containing the VHP domain (Fig 3C-G, and Fig S5). Using microscale thermophoresis, we also show that rRAP1-His binds to 70 bp and telomeric repeats with Kd of 10 and 24 nM, respectively (Fig 3 and Table 1). Notably, we show that 30 µM of PI(3,4,5)P3, but not PI(4,5,)P2 – used as a control – disrupts rRAP1-His binding to 70 bp and telomeric repeats, changing Kds to about 188 and 155 nM, respectively (Fig 5A-C). We also show that PI(3,4,5)P3 does not disrupt the binding of rRAP1-His fragments (Myb or MybL) without the N-terminus domain (Fig S5), implying binding of PI(3,4,5)P3 to RAP1 N-terminus is required for displacement of RAP1 DNA binding domains (Myb and MybL) from telomeric and 70 bp repeats, and that PI(3,4,5)P3 is not competing for Myb or Myb-L binding to DNA. Moreover, we show that RAP1-HA binding to 70 bp and telomeric repeats in vivo is displaced in T. brucei cells expressing catalytic inactive PIP5Pase (Fig 5D-G), which we show results in RAP1-HA binding about 100-fold more endogenous PI(3,4,5)P3 than in T. brucei expressing WT PIP5Pase (Fig 4F). The in vivo data agrees with the in vitro data. The data show a typical allosteric regulator system, in which binding of a ligand to one site of the protein, here PI(3,4,5)P3 binding to RAP1 N-terminus, affects other domains (RAP1 Myb and Myb-L domains) binding to DNA. To improve the clarity of the title, we will change it in the revised version to imply a direct role of PI(3,4,5)P3 regulation of RAP1 in the process. This will provide more specific information to the readers and addresses the concern of the reviewer related to the “allosteric switch”. The new title will be: PI(3,4,5)P3 allosteric regulation of RAP1 controls antigenic switching in trypanosomes

      There are serious concerns about many conclusions made by Touray et al., according to their experimental approaches:

      1) The authors have been studying RAP1’s chromatin association pattern by ChIPseq in cells expressing a C-terminal HA tagged RAP1. According to data from tryptag.org, RAP1 with an N-terminal or a C-terminal tag does not seem to have identical subcellular localization patterns, suggesting that adding tags at different positions of RAP1 may affect its function. It is therefore essential to validate that the C-terminally HA-tagged RAP1 still has its essential functions. However, this data is not available in the current study. RAP1 is essential. If RAP1-HA still retains its essential functions, cells carrying one RAP1-HA allele and one deleted allele are expected to grow the same as WT cells. In addition, these cells should have the WT VSG expression pattern, and RAP1-HA should still interact with TRF. Without these validations, it is impossible to judge whether the ChIPseq data obtained on RAP1-HA reflect the true chromatin association profile of RAP1.

      Tryptag data show both N- and C-terminus RAP1 with nuclear localization in procyclic forms, although there are differences in signal intensities in the images (http://tryptag.org/?id=Tb927.11.370). It is important to note that Tryptag data is from procyclic forms, and DNA constructs are not validated for their integration in the correct locus. As for the RAP1-HA localization in bloodstream forms, we demonstrated that C-terminally HA-tagged RAP1 co-localizes with telomeres by a combination of immunofluorescence and fluorescence in situ hybridization (Cestari and Stuart, 2015, PNAS), and RAP1-HA co-immunoprecipitate telomeric and 70 bp repeats (Cestari et al. 2019 Mol Cell Biol). We also showed by immunoprecipitation and mass spectrometry that HA-tagged RAP1 interacts with nuclear and telomeric proteins, including PIP5Pase (Cestari et al. 2019). Others have also tagged T. brucei RAP1 in bloodstream forms with HA without disrupting its nuclear localization (Yang et al. 2009, Cell; Afrin et al. 2020, Science Advances). As for the experiment suggested by the reviewer, there is no guarantee that cells lacking one allele of RAP1 will behave as wildtype, i.e., normal growth and repression of VSGs genes. Also, less than 90% of T. brucei TRF was reported to interact with RAP1 (Yang et al. 2009, Cell), which might be indirect via their binding to telomeric DNA repeats rather than direct protein-protein interactions.

      2) Touray et al. expressed and purified His6-tagged recombinant RAP1 fragments from E. coli and used these recombinant proteins for EMSA analysis: The His6 tag has been used for purifying various recombinant proteins. It is most likely that the His6 tag itself does not convey any DNA binding activities. However, using His6-tagged RAP1 fragments for EMSA analysis has a serious concern. It has been shown that His6-tagged human RAP1 protein can bind dsDNA, but hRAP1 without the His6 tag does not. It is possible that RAP1 proteins in combination with the His6 tag can exhibit certain unnatural DNA binding activities. To be rigorous, the authors need to remove the His6 tag from their recombinant proteins before the in vitro DNA binding analyses are performed. This is a standard procedure for many in vitro assays using recombinant proteins.

      We show in Fig 3C-G that His-tagged full-length rRAP1 does not bind to scrambled telomeric dsDNA sequences, which indicates that His-tagged rRAP1 does not bind unspecifically to DNA. Moreover, in Fig 3G, we show that His-tagged rRAP11-300 also does not bind to 70 bp or telomeric repeats. In contrast, full-length His-tagged rRAP1, rRAP1301-560, or rRAP1561-855 bind to 70 bp or telomeric repeats (Fig 3C-G). Since all proteins were His-tagged, the His tag cannot be responsible for the DNA binding.

      As for the statement that human rRAP1-His has unspecific DNA binding properties, we could not find a reference to this statement; we cannot compare it without knowing the details of the experiment. Biochemical assays can result in unspecific binding depending on binding/buffer conditions. Also, humans and T. brucei RAP1 share only 15% of amino acid identity; unspecific binding to DNA could be specific to human RAP1.

      3) It is unclear why Nanopore sequencing was used for RNAseq and ChIPseq experiments. The greatest benefit of Nanopore sequencing is that it can sequence long reads, which usually helps with mapping, particularly at genome loci with repetitive sequences. This seems beneficial for RAP1 ChIPseq analysis as RAP1 is expected to bind telomere repeats. However, for ChIPseq, the chromatin needs to be fragmented. Larger DNA fragments from ChIPseq experiments will decrease the accuracy of the final calculated binding sites. Therefore, ChIPseq experiments are not supposed to have long reads to start with, so Nanopore sequencing does not seem to bring any advantage. In addition, compared to Illumina sequencing, Nanopore sequencing usually yields smaller numbers of reads, and the sequencing accuracy rate is lower. The Nanopore sequencing accuracy may be a serious concern in the current study. All telomeres have the perfect TTAGGG repeats, all VSG genes have a very similar 3’ UTR, and all 70 bp repeats have very similar sequences. In fact, the active and silent ESs have 90% sequence identity. Are sequence reads accurately mapped to different ESs? How is the sequencing and mapping quality controlled? Furthermore, it is unclear whether the read depth for RNAseq is deep enough.

      The mean sequence length for the ChIP-seq was about 500 bp (see Table S3), which helps to align reads to ESs and distinguish the different ESs, and it is a reasonable size range to define RAP1 binding sites. Although sequencing depths are usually higher in Illumina than in nanopore (all depending on the amount of sequencing), most Illumina short reads map to multiple genomic sequences, making it difficult to distinguish ESs. This is particularly important for RAP1 because it binds to repeats such as 70 bp and telomeric repeats. Mapping short reads to those regions would be virtually impossible; hence, our choice of nanopore sequencing. For RNA-seq, the ~500 bp read length help sequence alignment to the subtelomeric regions containing many VSG genes. The nanopore reads obtained here had an average sequencing score 12 (i.e., base call accuracy of 94%). Filtering reads with MAPQ ≥ 20 (99% probability of correct alignment) helped us to distinguish RAP1 binding to specific ESs, including silent vs active ES (ChIP-seq) or VSG sequences (RNA-seq). The details of the analysis and sequencing metrics (i.e., sequencing depth and read length) were described in the Methods section “Computational analysis of RNA-seq and ChIP-seq” and Table S3, respectively.

      4) Many statements in the discussion section are speculations without any solid evidence. For example, lines 218 - 219 “likely due to RAP1 conformational changes”, no data have been shown to support this at all. In lines 224-226, the authors acknowledged that more experiments are necessary to validate their observations, so it is important for the authors to first validate their findings before they draw any solid conclusions. Importantly, RAP1 has been shown to help compact telomeric and subtelomeric chromatin a long time ago by Pandya et al. (2013. NAR 41:7673), who actually examined the chromatin structure by MNase digestion and FAIRE. The authors should acknowledge previous findings. In addition, the authors need to revise the discussion to clearly indicate what they “speculate” rather than make statements as if it is a solid conclusion.

      The statement “likely due to RAP1 conformational changes” in lines 218-219 (page 6) is part of the Discussion. We did not make a strong statement but discussed a possibility. We believe that it is beneficial to the reader to have the data discussed, and we do not feel this point is overly speculative.

      For lines 224-226 (page 6), the statement refers to the finding of RAP1 binding to centromeric regions by ChIP-seq, which is a new finding but not the focus of this work. Hence, future studies are necessary for this finding, and we believe it is appropriate in the Discussion to be upfront and highlight this point to the readers. However, for the RAP1 binding to telomeric ES sites, e.g., 70 bp repeats and telomeric repeats (the focus of this work), we validated the binding by EMSA and by performing binding kinetics using microscale thermophoresis.

      We did not include Pandya et al. 2013 NAR because the authors demonstrated RAP1 compaction of chromatin to occur in procyclic forms only. Pandya et al. stated in their abstract: “no significant chromatin structure changes were detected on depletion of TbRAP1 in BF cells”. Hence, the suggested reference is not relevant to the context of our conclusions in bloodstream forms. Nevertheless, we have reviewed the Discussion to avoid broad speculations in the revised version of the manuscript.

      There are also minor concerns:

      1) In the PIP5Pase conditional knockout system, the WT or mutant PIP5Pase with a V5 tag is constitutively expressed from the tubulin array. What’s the relative expression level of this allele and the endogenous PIP5Pase? Without a clear knowledge of the mutant expression level, it is hard to conclude whether the mutant has any dominant negative effects or whether the mutant phenotype is simply due to a lower than WT PIP5pase expression level.

      The relative mRNA levels of the exclusive expression of PIP5Pase Mut compared to the WT is available in the Data S1, RNA-seq. The Mut allele’s relative expression level is 0.85-fold to the WT allele (both from tubulin loci). We also showed by Western blot the WT and Mut PIP5Pase protein expression (Cestari et al. 2019, Mol Cell Biol). Concerning PIP5Pase endogenous alleles, we compared RNA-seq reads counts per million from the conditional null PIP5Pase cells exclusively expressing WT or the Mut PIP5Pase alleles (Data S1, this work) to our previous RNA-seq of single-marker 427 strain (Cestari et al. 2019, Mol Cell Biol). We used the single-maker 427 because the conditional null cells were generated in this strain background. The PIP5Pase WT and Mut mRNAs expressed from tubulin loci are 1.6 and 1.3-fold the endogenous PIP5Pase levels in single-marker 427, respectively. We include a statement in the Methods, page 7, lines 265-268: “The WT or Mut PIP5Pase mRNAs exclusively expressed from tubulin loci are 1.6 and 1.3-fold the WT PIP5Pase mRNA levels expressed from endogenous alleles in the single marker 427 strain. The fold-changes were calculated from RNA-seq reads counts per million from this work (WT and Mut PIP5Pase, Data S1) and our previous RNA-seq from single marker 427 strain (24).”

      2) In EMSA analysis, what are the concentrations of the protein and the probe used in each reaction? The amount of protein used in the binding assay appears to be very high, and this can contribute to the observation that many complexes are stuck in the well. Better quality EMSA data need to be shown to support the authors’ claims.

      All concentrations were provided in the Methods section. See page 9 Electrophoretic mobility shift assays: “100 nM of annealed DNA were mixed with 1 μg of recombinant protein…”. For microscale thermophoresis, also see page 9, Microscale thermophoresis binding kinetics: “1 μM rRAP1 was diluted in 16 two-fold serial dilutions in 250 mM HEPES pH 7.4, 25 mM MgCl2, 500 mM NaCl, and 0.25% (v/v) N P-40 and incubated with 20 nM telomeric or 70 bp repeats…”. Note that two different biochemical approaches, EMSA and microscale thermophoresis, were used to assess rRAP1-His binding to DNA. Both show similar results (Fig 3 and 5, and Fig S5; microscale thermophoresis shows the binding kinetics, data available in Table 1). The EMSA images clearly show the binding of RAP1 to 70 bp or telomeric repeats but not to scramble telomeric repeat DNA.

      Reviewer #2 (Public Review):

      This manuscript by Touray, et al. provides a significant new twist to our understanding of how antigenic variation may be regulated in T. brucei. Key aspects of antigenic variation are the mutually exclusive expression of a single antigen per cell and the periodic switching from expression of one antigen isoform to another. In this manuscript, the authors show, as they have previously shown, that depletion of the nuclear phosphatidylinositol 5-phosphatase (PIP5Pase) results in a loss of mutually exclusive VSG expression. Furthermore, using ChIP-seq, the authors show that the repressor/activator protein 1 (RAP1) binds to regions upstream and downstream of VSG genes located in transcriptionally repressed expression sites and that this binding is lost in the absence of a functional PIP5Pase. Importantly, the authors decided to further investigate this link between PIP5Pase and RAP1, a protein that has previously been implicated in antigenic variation in T. brucei, and found that inactivation of PIP5Pase results in the accumulation of PI(3,4,5)P3 bound to the RAP1 N-terminus and that this binding impairs the ability of RAP1 to bind DNA. Based on these observations, the authors suggest that the levels of PI(3,4,5)P3 may determine the cellular function of RAP1, either by binding upstream of VSG genes and repressing their function, or by not binding DNA and allowing the simultaneous expression of multiple VSG genes in a single parasite.

      While I find most of the data presented in this manuscript compelling, there are aspects of Figure 1 that are not clear to me. Based on Figure 1F, the authors claim that transient inactivation of PIP5Pase results in a switch from the expression of one VSG isoform to another. However, I am not exactly sure what the authors are showing in this panel, nor do the data in Figure 1F seem to be consistent with those shown in Figure 1C. Based on Figure 1F, a transient inactivation of PIP5Pase appears to result in an almost exclusive switch to a VSG located in BES12. However, based on Figure 1E, the VSG transcripts most commonly found after a transient inactivation of PIP5Pase are those from the previously active VSG (BES1) and VSGs located on chr 1 and 6 (I believe). The small font and the low resolution make it impossible to infer the location of the expressed VSG genes, nor to confirm that ALL VSG genes located in expression sites are activated, as the authors claim. Also, I was not able to access the raw ChIP-seq and RNA-seq reads. Thus, could not evaluate the quality of the sequencing data.

      We appreciate the reviewer’s comments and evaluation of our work. Fig 1E shows VSG-seq of a population after transient (24h) exclusive expression of the PIP5Pase mutant, followed by re-expression of the WT PIP5Pase allele for 60 hours (multiple VSGs are detected). As a control, it also shows VSG-seq in cells continuously expressing WT PIP5Pase (mostly VSG2, BES1 is detected). Fig 1F and Fig S1 show the sequencing of VSGs expressed by clones isolated (5-6 days of growth) after a temporary knockdown (24h) of PIP5Pase (tet -), followed by its re-expression. For comparison, no knockdown (tet +) was included. Fig 1F shows potential switchers in the population, the Fig 1E confirms VSG switching in clones.

      To clarify the difference between Fig 1E and 1F, we edited the manuscript on page 3, lines 103-110: “To verify PIP5Pase role in VSG switching, we knocked down PIP5Pase for 24h (Tet -), then restored its expression (Tet +) and isolated clones by limiting dilution and growth for 5-6 days. Analysis of isolated clones after temporary PIP5Pase knockdown (Tet -/+) confirmed VSG switching in 93 out of 94 (99%) of the analyzed clones (Fig 1F, Fig S1). The cells switched to express VSGs from silent ESs or subtelomeric regions, indicating switching by transcription or recombination mechanisms. Moreover, no switching was detected in 118 isolated clones from cells continuously expressing WT PIP5Pase (Tet +, Fig 1F).”. We also edited Fig 1F to indicate temporary knockdown (Tet -/+) vs no knockdown (Tet -). The modifications will be available in the resubmitted version of the manuscript.

      We agree that the heat map is difficult to read due to the amount of information. We will include in the revised version of the manuscript a table with the data in the supplementary information; the reader will be able to evaluate the data in detail.

      A preference for switching to specific ESs has been observed in T. brucei (Morrison et al. 2005, Int J Parasitol; Cestari and Stuart, 2015, PNAS), which may explain several clones switching to BES12. Many potential switchers were detected in the VSG-seq (Fig 1F, the whole cell population is over 107 parasites), but not all potential switchers were detected in the clonal analysis because we analyzed 212 clones total, a fraction of the over 107 cells analyzed by VSG-seq (Fig 1E). Also, it is possible that not all potential switchers are viable. However, the point of the clonal analysis is to validate the VSG switching after genetic perturbation of PIP5Pase.

      Fig 1C shows examples of ES derepression by RNA-seq after 24h exclusive expression of the mutant compared to WT PIP5Pase. The RNA-seq shows that all ESs are derepressed (Fig 1B). This can be visualized in the volcano plot (Fig 1B, BES and MES VSGs are labelled) and on the spreadsheet Data S1. Although all ESs are derepressed after PIP5Pase mutant expression, not all ESs are selected during switching, as observed in Fig 1E-F. This agrees with our previous observations in switching assays with proteins that control VSG switching (Cestari and Stuart, 2015, PNAS).

      As for metrics of sequencing and raw sequencing data. See Methods section, page 13, lines 483-485: “Sequencing information is available in Table S3 and fastq data is available in the Sequence Read Archive (SRA) with the BioProject identification PRJNA934938.” Table S3 has a summary of sequencing data. Metrics information such as sequencing quality and analysis can be found in the Methods section “Computational analysis of RNA-seq and ChIP-seq”. The latter includes information about nanopore reads, i.e., mean Q-score of 12.

      Reviewer #3 (Public Review):

      In this manuscript, Touray et al investigate the mechanisms by which PIP5Pase and RAP1 control VSG expression in T. brucei and demonstrate an important role for this enzyme in a signalling pathway that likely plays a role in antigenic variation in T. brucei.

      The methods used in the study are rigorous and well-controlled. The authors convincingly demonstrate that RAP1 binds to PI(3,4,5)P3 through its N-terminus and that this binding regulates RAP1 binding to VSG expression sites, which in turn regulates VSG silencing. Overall their results support the conclusions made in the manuscript.

      There are a few small caveats that are worth noting. First, the analysis of VSG derepression and switching in Figure 1 relies on a genome that does not contain minichromosomal (MC) VSG sequences. This means that MC VSGs could theoretically be misassigned as coming from another genomic location in the absence of an MC reference. As the origin of the VSGs in these clones isn’t a major point in the paper, I do not think this is a major concern, but I would not over-interpret the particular details of switching outcomes in these experiments.

      The authors state that “our data imply that antigenic variation is not exclusively stochastic.” I am not sure this is true. While I also favor the idea that switching is not exclusively stochastic, evidence for a signaling pathway does not necessarily imply that antigenic variation is not stochastic. This pathway could be important solely for lifecycle-related control of VSG expression, rather than antigenic variation during infection. Nevertheless, these data are critical for establishing a potential pathway that could control antigenic variation and thus represent a fundamental discovery.

      Another aspect of this work that is perhaps important, but not discussed much by the authors, is the fact that signalling is extremely poorly understood in T. brucei. In Figure 1B, the RNA-seq data show many genes upregulated after expression of the Mut PIP5Pase (not just VSGs). The authors rightly avoid claiming that this pathway is exclusive to VSGs, but I wonder if these data could provide insight into the other biological processes that might be controlled by this signaling pathway in T. brucei.

      Overall, this is an excellent study that represents an important step forward in understanding how antigenic variation is controlled in T. brucei. The possibility that this process could be controlled via a signalling pathway has been speculated for a long time, and this study provides the first mechanistic evidence for that possibility.

      We thank the reviewer for the evaluation of our work. We agree that it is difficult to ensure the origin of all VSG genes not having minichromosome sequences; hence we did not emphasize this point in the manuscript. We used the 427-2018 reference genome assembled by PacBio and Hi-C (Muller et al. 2018, Nature), which we believe is the best assembly for the 427 strain, especially related to the VSG genes.

      We also agree that having signaling controlling switching in vitro does not mean the switching necessarily occurs by signaling in vivo. Nevertheless, stochastic switching is an accepted model; but it has not been proved, whereas we provide molecular evidence that signaling can cause switching. To express this reviewer’s suggestion, we edited the Discussion, page 7, line 250: from “our data imply that antigenic variation is not exclusively stochastic” to “our data suggest that antigenic variation is not exclusively stochastic”.

      Most of the RNA-seq data were VSGs genes/pseudogenes. Other genes upregulated included retrotransposons and DNA/RNA processing enzymes such as endonucleases and polymerases. We included in the Results, page 3, line 100: “Other genes upregulated include primarily retrotransposons, endonucleases, and polymerase proteins.”.

    1. Reviewer #3 (Public Review):

      It is well known that as seasonal day length increases, molecular cascades in the brain are triggered to ready an individual for reproduction. Some of these changes, however, can begin to occur before the day length threshold is reached, suggesting that short days similarly have the capacity to alter aspects of phenotype. This study seeks to understand the mechanisms by which short days can accomplish this task, which is an interesting and important question in the field of organismal biology and endocrinology.

      The set of studies that this manuscript presents is comprehensive and well-controlled. Many of the effects are also strong and thus offer tantalizing hints about the endo-molecular basis by which short days might stimulate major changes in body condition. Another strength is that the authors put together a compelling model for how different facets of an animal's reproductive state come "on line" as day length increases and spring approaches. In this way, I think the authors broadly fulfill their aims.

      I do, however, also think that there are a few weaknesses that the authors should consider, or that readers should consider when evaluating this manuscript. First, some of the molecular genetic analyses should be interpreted with greater caution. By bioinformatically showing that certain DNA motifs exist within a gene promoter (e.g., FSHbeta), one is not generating robust evidence that corresponding transcription factors actually regulate the expression of the gene in question. In fact, some may argue that this line of evidence only offers weak support for such a conclusion. I appreciate that actually running the laboratory experiments necessary to generate strong support for these types of conclusions is not trivial, and doing so may even be impossible. I would therefore suggest a clear admission of these limitations in the paper.

      Second, I have another issue with the interpretation of data presented in Figure 3. The data show that FSHbeta increases in expression in the 8Lext group, suggesting that endogenous drivers likely act to increase the expression of this gene despite no change in day length. However, more robust effects are reported for FSHbeta expression in the 10v and 12v groups, even compared to the 8Lext group. Doesn't this suggest that both endogenous mechanisms and changes in day length work together to ramp up FSHbeta? The rest of the paper seemed to emphasize endogenous mechanisms and gloss over the fact that such mechanisms likely work additively with other factors. I felt like there was more nuance to these findings than the authors were getting into.

      Third, studies 1 - 3 are well controlled; however, I'm left wondering how much of an effect the transitions in day length might have on the underlying molecular processes that mediate changes in body condition. While the changes in day length are themselves ecologically relevant, the transitions between day length states are not. How do we know, for example, that more gradual changes in day length that occur over long timespans do not produce different effects at the levels of the brain and body? This seemed especially relevant for study 3, where animals experience a rather sudden change in day length. I recognize that these experimental methods are well described in the literature, and they have been used by endocrinologists for a long time; nonetheless, I think questions remain.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their insights and comments on this manuscript. Specific responses to reviewer concerns are detailed below. We made a couple of significant changes based on the feedback. First, we performed more experiments to increase biologic replicates and then quantified image data for multiple figures. The new quantitative information added to Figure 3 fully supports our original conclusions about changes to the ONH in Hes-TKO mutants. The quantification of Atoh7, Otx2, Rbpms and Crx expressing cells among the different genotypes revealed interesting differences in Notch intracellular gene requirements for both RGC and cone development. The most startling outcome is that changes in both cell types correlate with significant changes in Otx2, but not Atoh7. This singular finding suggests interesting future work is needed, well beyond the scope of this paper about the molecular mechanisms underlying these cell fates. Second, our data presentation was reorganized with new information added to Fig 1 that clarifies the relationships between Hes1, Hes5, Foxg1 and Pax2; old Figs 6 & 7 about neurogenesis were merged; and some data moved to new Suppl Figs 2 and 5. The numbering for multiple figures changed and a new summary model (now Fig 8) is provided. In addition, the manuscript was completely rewritten to improve clarity. We hope this revised manuscript is acceptable for publication.

      Reviewer #1 Summary:

      In this study, the authors employed an impressive set of mouse mutant or Cre lines to investigate the complexity of Notch signaling across different stages of retinal development. These comprehensive analyses led to two main findings: 1. Sustained hes1 in the OHS/OS is Notch-independent; 2. Rbpj and Hes1 exhibited opposing roles in cone photoreceptor development. Although the study is potentially interesting, the current manuscript needs the essential research background and quantification, a lack of which significantly reduced the clarity of the manuscript and the credibility of the major conclusions. Also, how the authors organized the results is quite confusing, making the manuscript very difficult to follow.

      Response: We agree with all reviewers concerning incomplete quantification of the data. We directly addressed this shortcoming in revised Figs 3 and 6 (the latter combines old Figs 6 +7). To do this, we repeated some IHC experiments to add more replicates and reorganized all of the neurogenesis phenotypic data figures. Our quantifications uncovered several surprising outcomes that clarify our model. For these reasons, the manuscript was exhaustively rewritten. We merged E13 neurogenesis data into revised Figure 6 and moved the most relevant E16 analyses to new supplemental data Fig 5. All changes made should make the paper easier to understand for retinal development, neurogenesis, and Notch pathway aficionados, in addition to readers lacking such expertise.

      Major comments: 1. The authors needed to make the quantification for many analyses to strengthen the conclusions, such as Fig. 1F, 1G, and etc.

      Response: We quantified optic nerve head (ONL) immunohistochemistry data in the revised Fig 3. We also quantified neurogenesis markers Atoh7, Otx2, Rbpms (RGCs), and Crx at E13 in revised Fig 6 (former Figs 6 and 7). Older stages were moved to a new Suppl Fig 5.

      Respectfully, Hes5 mRNA expression in old Fig 1F and 1G shows that Hes5, like other retinal progenitor cell (RPC) markers, expanded in Rax-Cre deletion but not Chx10-Cre deletion conditions. This is analogous to Pax6 and Rax expansion in Rax-Cre;Hes1 CKO eyes and Pax2 mutants (doi: 10.1523/JNEUROSCI.2327-19.2020) (1). In revised Fig 1, we now show analogous expansion of Hes5 mRNA in Pax2 mutant retinas (compare Figs 1F-1I). Because Hes5 RNA in situ hybridization experiments are nonquantitative, we do not discuss the possibility of Hes5 mRNA level changes in labeled cells.

      The authors reported many exciting results. However, further mechanistic insights are largely missing. They may focus on one of these exciting findings and give some mechanistic insights. For example, hes1 suppresses hes5 expression as the ONH boundary forms; hes1 expression in the ONH is Notch independent; differential influences of Rbpj and Hes1 on cone development. It is better for the authors to select one of these exciting findings and provide a deeper mechanistic study.

      Response: This revision brings fresh focus to Notch regulation of RGC and photoreceptor development, particularly differential influences for Rbpj versus Hes1. We also better support our interpretation of image data in Fig 1. We include new data about the spatial relationships between Hes5-GFP/Pax2 and Hes5-GFP/Foxg1. In summary, we find that as Pax2 becomes restricted to the nasal optic cup prior to the onset of RGC genesis, it becomes mutually exclusive with Hes5-GFP, at the same time that Hes5-GFP+ cells coexpress Hes1. This is consistent with Hes1 indirectly regulating Hes5-GFP as a marker of neurogenic RPCs at the forming ONH. Furthermore, it emphasizes the importance of genetically teasing apart the separate and potentially compensatory roles for Hes1 versus Hes5 undertaken here. These relationships remain poorly resolved during vertebrate CNS development.

      Some analyses lack an explanation of the rationale. For example, "To understand if the loss of multiple Hes genes is more catastrophic than Hes1 alone..."(PAGE 7). Please explain its significance.

      Response: We assume the reviewer is referring to the first sentence of the last paragraph on this page. We analyzed Hes triple mutant mice (TKO) to understand if removing multiple Hes genes reveals redundant functions. This is an open question, given that Hes1 is expressed in the ONH/OS, which is normally devoid of Hes5 by the time retinal neurogenesis begins. These questions have only been explored in a handful of tissues throughout the body. Also see response to point 2 above. In general, we have expanded the rationale for all of the experiments throughout the revised manuscript.

      Significance: In general, many results are quite interesting. However, the significance of these findings is largely hampered in the following aspects: 1. The authors were unable to provide the sufficient research contexts that are essential for understanding many results.2. Many conclusions were solely based on descriptive images but lacked statistical quantification, which significantly weakened many conclusions. 3. Many interesting findings are quite descriptive, and some mechanistic understandings of one of these exciting findings will be beneficial to improve the focus and significance of the study. Current format of the manuscript fits more specialized audience.

      Response: During in vivo development, we wished to understand which particular Notch pathway genes can interact in a Notch-dependent versus a Notch-independent manner. Genetic (phenotypic) studies produce extremely rigorous datasets, in our opinion. This revision now extensively quantifies key findings. Here we dissected the "receipt" of a Notch signal by identically testing the functional requirements of particular pathway members. For Mastermind (Maml), there are 3 paralogues, double mutants for Maml1 and Maml3 are early lethal, and no floxed alleles exist, so it was logical to employ the ROSA-dnMaml mouse strain, particularly since it has been discussed throughout the Notch literature as "analogous" to removing either a Notch receptor or Rbpj. Our finding that the dnMAML allele does not function like a Rbpj null in the retina is important for researchers in the broad Notch field to consider when designing and interpreting experiments.

      Reviewer #2: Hes genes are effectors of the Notch signaling pathway but can also act down-stream of other signaling cascades. In this manuscript the authors attempt to address the complexity of Hes effectors during optic cup development and retinal neurogenesis. To do so, they compared optic cup patterning and retinal neurogenesis in seven germline or conditional mutant mouse embryos generated with two spatio-temporally distinct Cre drivers. These lines allowed for the analysis of the consequences of perturbing the Notch ternary complex and multiple Hes genes alone or in combination. The authors show that the optic disc/nerve head is regulated by Notch independent Hes1 function. They also confirm that perturbation of Notch signaling interferes with cell proliferation enhancing the production of differentiated ganglion cells, whereas photoreceptor genesis requires both Rbpj and Hes1 with Notch dependent and independent mechanisms. This is a rather complex study that dissects further the role of the Notch pathway and Hes proteins during eye development, a topic that has been addressed in many previous studies but perhaps not with the details that the authors have used here. In this respect, this study adds to current literature but will likely be of interest to retina aficionados. The manuscript reads well and the figures are of very good quality. However, many of the statements are based on qualitative rather than on quantitative analysis. This should be, at least in some cases, remediated, despite the effort that this may require given the number of mouse lines used in the study.

      Response: As described in the response to Reviewer 1, we agree and present considerably more quantification data. We extensively reorganized and rewrote this manuscript to emphasize that Hes1 in the ONH/OS is fully Notch-independent and highlight branchpoints in Notch-dependent signaling, for Rbpj versus Hes,1 during early retinal neurogenesis. It is too simplistic that the ternary complex (Rbpj-NICD-Maml) simply activates Hes1 (and/or multiple Hes genes) to regulate downstream signaling targets. This paradigm has been portrayed in the literature numerous times for many processes throughout vertebrate development, homeostasis or relative to particular diseases. By focusing on one tissue and a narrow window of development, our phenotypic studies delved more deeply to show the greater complexity and molecular cross-talk that we think underlie the modulation of signaling levels with in vivo context. Thus, our results are of broad interest and impact to the greater Notch field.

      1. The title is somewhat misleading. The authors have explored mostly the role of Hes1, 3 and5. Although these are Notch effectors, there is already evidence that they participate in other pathways This is confirmed by the data present here. I would suggest to eliminate Notch from the title and use instead "Hes" to better reflect the findings. Furthermore, it is unclear why there is a reference to "mutations" or what are the Notch branchpoints to which the authors refer at the beginning of the discussion.

      Response: We appreciate the reviewer’s viewpoint but disagree this paper is mostly about Hes genes, as there is a critical direct, comparable evaluation with Rbpj and dn-Maml. Direct comparison of 7 genotypes highlights where each pathway member exhibits idiosyncratic phenotypes. We are striving for a clear, simple title about a very complex topic, involving the in vivo genetic dissection of a signaling pathway. We modified the title to: "Notch pathway mutations do not equivalently perturb mouse embryonic retinal development "

      1. "Although the Pax6-Pax2 boundary is intact in Rax-Cre;RbpjCKO/CKO eyes, ONH shape was attenuated compared to controls (Fig 3I)". This statement is arguable as the difference seems subtle. Perhaps some kind of quantification would help.

      Response: We quantified Pax2+ cells (ONH domain) using the adjacent proximal terminus of the retinal pigmented epithelium (RPE) to indicate a transition from ONH to optic stalk (OS). We also quantified the number of Pax2+Pax6+ double positive cells where the 2 domains abut (boundary cells). Some higher magnification examples are now provided in Fig 3H';3K';3N'. Grossly, the imaging data support that the Pax2+ ONH is expanded in Chx10-Cre;TKO eyes, while boundary cells are most affected in Rax-Cre;HesTKO eyes, due to an expansion of retinal tissue. This is supported by our quantitative data (Fig 3O,3P). We observed even in controls that Pax2-expressing cells show some numerical variability. We attributed this to the position of the section through the ONH, which is a 3-dimsenional ring (torus). Therefore, we quantified additional wild-type controls and mutant samples in the new Fig 3O,3P graphs, improving statistical power, and allowing us to detect quantitative differences.

      Page 12 first paragraph. "....but all other genotypes were unaffected". This statement is unclear. All lines in which the Rax-Cre has been used seem to have an increased number of apoptotic cells. This should be better explained

      Response: Respectfully, only one genotype, Rax-Cre;Rbpj mutants contain a statistically significant increase in apoptotic cells (Fig 5P). This is demonstrated by one-way ANOVA analyses that included all pairwise comparisons. To ensure that the quantification was not misleading due to changes in tissue morphology, data in Figs 5, 6, and 7 were normalized to optic cup area. The area was traced in FIJI, creating a polygon whose area was determined in square microns. For every section image, the marker+ cells were divided by the square micron area of the retina (excluding the opening for the optic nerve). Such a method is critical for comparison across this allelic series, given the morphologic changes, differences in cell clustering where rosettes form, and reduced proliferation whenever Notch signaling is lost or reduced.

      Page 12, end of second paragraph: "E13.5 Chx10-Cre;HesTKO eyes had a milder RGC phenotype (Figs 6G, 6N, 6U), but all other mutants were unaffected (Figs 6E, 6F, 6L, 6M, 6S, 6T). This statement is also rather subjective. The phenotype of Chx10-Cre;HesTKO is quite strong and the other mutants seem to have a phenotype. Some quantifications here will help.

      Response: We agree and provide quantification for both Atoh7 and Rbpms positive cells in the revised Figure 6. This is now in the same figure with quantification of Otx2+, Otx2+Atoh7+ and Crx+ cells. The reviewer is correct that both ROSA-dnMaml and both HesTKO mutants have a statistically significant increase in RGCs. Surprisingly, neither of the Rbpj CKO mutants have this outcome (Fig 6Y).

      1. Page 13, toward the bottom..."...but noted that Chx10-Cre RbpjCKO/CKO eyes were not different from controls (Figs 7E, 7AA)". Again, this statement is questionable as staining for both CRX and Rbpms seem reduced as compared to controls as quantifications in 7AA seems also to indicate (about half?). Did the authors calculate whether there is a statistical difference between controls and Chx10-Cre RbpjCKO/CKO ?

      Response: Rbpms+ RGCs and Crx+ photoreceptor precursors were colabeled and quantified on sections for all genotypes. All counts were normalized to area as described above. Upon quantification and ANOVA with pairwise comparisons, there was no statistical difference in Crx+ or Rbpms+ cells between control and Chx10-Cre;Rbpj mutants (new Fig 6Y and Z).

      In Fig 7CC the authors should make the effort of including at least one additional sample, 2 biological replicates seem insufficient to draw a conclusion.

      Response: The Rax-Cre;Hes1CKO/+ X Hes1CKO/CKO matings stopped producing litters in late 2022. While this manuscript was out for review, we obtained younger mice, from which new control and Rax-Cre; Hes1 mutant littermates were collected, stained, imaged and quantified. Upon adding samples, we found that the outcome was unchanged, but the data better support the lack of a statistical difference in rods between genotypes at E17. These data were moved to revised Suppl Fig 5.

      Significance: This is a rather complex study that dissects further the role of the Notch pathway and Hes proteins during eye development, a topic that has been addressed in many previous studies but perhaps not with the details that the authors have used here. In this respect, this study adds to current literature but will likely be of interest to retina aficionados. The manuscript reads well and the figures are of very good quality. However, many of the statements are based on qualitative rather than on quantitative analysis. This should be, at least in some cases, remediated, despite the effort that this may require given the number of mouse lines used in the study.

      Response: To increase the impact of our manuscript, we quantified all markers except Tubb3, since its localization in cell bodies and axons make it impossible to assign to individual cells. We feel that this additional quantification strongly improves the quality of our findings and allowed us to make well-supported and novel conclusions. While we certainly believe that the retinal development community will find this paper of interest, it will also be of value to the broader Notch pathway scientific community. In this manuscript, we simultaneously compared phenotypes for Notch pathway genes in signal receiving cells. We could find essentially no studies like this for the mouse CNS and only a few from the Kopan lab about the kidney and immune system. Interestingly, one of us (NLB) is a coauthor on a recent paper about Notch signaling in the cortex, in which ROSA-dnMaml behaves analogously to Notch1CKO or RbpjCKO. This emphasizes that findings in one organ may not recapitulate the "rules" for this pathway for other cell types or tissues (doi: 10.1242/dev.201408)(2). Deeper understanding of how the Notch pathway in the retina functions, analogously or differently, is important. We feel our revised study advances when and where there are "branchpoints" in canonical signaling that may be overlooked in other developing tissues and organs.

      Reviewer #3: I have reviewed a manuscript submitted by Bosze et al., which is entitled "Not all Notch pathway mutations are equal in the embryonic mouse retina". The authors focused on Notch signaling pathway. Notch signaling is deeply conserved across vertebrate and invertebrate animal species: in general, two transmembrane proteins, Delta and Notch, interact as a ligand and a receptor, respectively, which induces proteolytic cleavage of Notch receptors to generate Notch intracellular domain (NICD). NICD is translocated into nucleus, then forms the transcription factor complex including Rbpj (also referred to as CBF1) and Mastermind-like (Maml), and activates the transcription of Hes family transcription factors. Three Hes proteins, Hes1, 3, and 5, are important for nervous system development. In the vertebrate developing retina, these Hes proteins inhibit neurogenesis to maintain a pool of neural progenitor cells. In addition to their primary role in neurogenesis, the authors recently reported that Hes1 promotes cone photoreceptor differentiation. In the later stages of development, Hes proteins also promote Müller glial differentiation. In addition, Hes1 is highly expressed in the boundary between the neural retina and optic stalk and required for this boundary maintenance. To understand precise regulation of Notch component-mediated signaling network for retinal neurogenesis and cell differentiation, the authors compared retinal phenotypes in the knockdown of three Notch pathway components, that is (1) Hes1/3/5 cTKO, (2) Rbpj KO, and (3) dominant-negative Maml (dnMaml) overexpression, under the control of two Cre derivers; Rax-Cre and Chx10-Cre. First, the authors found that Hes1 expression in the boundary between optic stalk and neural retina is lost in Rax-Cre; Hes1/3/5 cTKO, but still retained in Rax-Cre; Rbpj KO and Rax-Cre; dnMaml overexpression, suggesting that Delta-Notch interaction is not required for Hes1 expression in the boundary between optic stalk and neural retina. Furthermore, Hes1 expressing boundary region expands distally at the expense of the neural retina in Chx10-Cre; Hes1/3/5 cTKO. Maintenance of ccd2 expression in this expanded boundary area suggests that Hes1 normally maintains a proliferative state in the optic stalk, which may allow these cells to differentiate into astrocyte in later stages. Second, in addition to precocious RGC differentiation in all the Notch component KO, the authors found that, as compared with wild-type, cone and rod photoreceptor genesis is highly enhanced in Rax-Cre; Rbpj KO and Rax-Cre; dnMaml overexpression and mildly enhanced in Chx10-Cre; dnMaml overexpression. On the other hand, in Rax-Cre; Hes1/3/5 cTKO, cone and rod photoreceptor genesis is not enhanced but similar to wild-type level. Since the authors previously reported that cone genesis is reduced in Rax-Cre; Hes1 cKO and Chx10-Cre; Hes1 cKO, so Rax-Cre; Hes1/3/5 cTKO may rescue decrease in cone genesis in single Hes1 cKO. The authors raise the possibility that elevated Hes5 expression in single Hes1 cKO may suppress cone photoreceptor genesis. The authors also found that amacrine cell genesis is significantly suppressed in Rax-Cre; Rbpj KO but not changed in Rax-Cre; dnMaml overexpression and Rax-Cre; Hes1/3/5 cTKO, suggesting that Rbpj is specifically required for amacrine cell genesis. From these observations, the authors propose that there are at least two branchpoints for photoreceptor and amacrine cell genesis in Notch component-mediated signaling network. Their findings are very interesting and provide some new insight on how Notch signaling components are integrated into other signaling pathways and promote to generate diverse but well-balanced retinal cell-types during retinal neurogenesis and cell differentiation, in addition to conventional classic view of Notch signaling pathway. However, one weak point is that, although the authors figured out what kinds of phenotypic difference appear in the KO retinas between these Notch components, the research result is descriptive and less analytical. Most of their conclusions may be supported by their previous works or others; it is still hypothetical. So, it is important to show more analytical data to support their interpretation and more clearly show what is new conceptual advance for Notch signaling pathways.

      For example, sustained Hes1 expression in the boundary region between optic stalk and neural retina may be reminiscent to brain isthmus situation. I would like to request the authors to show more direct evidence that Hes1 regulation in optic stalk/retina boundary is independent of Delta-Notch interaction. One possible experiment is whether DAPT treatment phenocopies Rax-Cre; Rbpj KO and Rax-Cre; dnMaml overexpression (Hes1 in optic stalk boundary is normal?).

      Response: Usage of the gamma secretase inhibitor DAPT is an interesting experiment as it can phenocopy the loss of Notch signaling in developing tissues. However, the reviewer's proposed DAPT experiment is problematic for two major reasons. First, DAPT blocks the gamma secretase complex, which has more than 90 protein targets in the cell membrane (3). Therefore, DAPT may not be informative for Hes1 regulation given the myriad of expected off-target effects. Second, it would be difficult to treat embryos at the relevant stages with DAPT. Injections into pregnant mice are lethal and we cannot localize drug to the relevant area during in vivo development. Our direct phenotypic comparisons with two Cre drivers strongly indicate that Hes1 is independent of canonical Notch signaling in the developing optic stalk.

      We include an extra related data figure (Reviewer Fig 1) showing anti-Hes1 immunolabeling of E13.5 Rax-Cre;Notch1CKO/CKO (n=2) and E13.5 Rax-Cre;Notch2CKO/CKO eyes (n=3). The Notch1 mutant lost oscillating Hes1 expression in retinal progenitors, but the uniform Hes1 ONH domain remains. Interestingly, the Notch2 mutant had essentially no effect on Hes1 (oscillating or sustained), or Hes5 mRNA expression. A Notch2 RNA in situ hybridization demonstrates that Notch2 mRNA was lost in the E13 optic cup and RPE (Rax-Cre expressing tissues). These data emphasize: A) the Notch1-specific dependency of oscillating Hes1 expression in retinal progenitors is absent from the ONH; B) although coexpressed in the same tissue, Notch receptors have unequal activities.

      Does Rax-Cre; Rbpj KO; Hes1-cKO phenocopy Rax-Cre; Hes1-cKO (or Rax-Cre; Hes1/3/5 cTKO)?

      Response: This is a good question! The first author tried very hard to produce Rax-Cre; Rbpj CKO;Hes1 CKO double mutant embryos. However, these progeny could not be recovered from E10-E13 embryos, despite collecting more than 10 litters. Thus, it is likely that this genotype is lethal before eye formation.

      Could the authors identify an enhancer element that drives Hes1 transcription in optic stalk/retina boundary, which should be not overlapped with that of NICD/ Rbpj binding motif? Such additional evidence will make their conclusion more convincing.

      Response: Another interesting question. We have been working for >3 years on Hes1 cis regulatory enhancers, but the pandemic greatly delayed progress. The proximal Hes1 600bp upstream region is a generic enhancer that contains Hes1 binding sites for repressing its own expression (4) and has a pair of Rbpj consensus sites for Notch ternary complex activation of Hes1 expression (5,6). Nearby is a binding site occupied by Gli2 in the E16 mouse retina (7). Recently, it was shown that Ikzf4 binds slightly farther away (8). The upstream 1.8 kb region (including the 600bp just described) can drive destabilized GFP or dsRed reporters in early postnatal retinal explants (9). However, this sequence was used to make and analyze a classic Hes1-GFP transgenic reporter mouse, in which GFP was not expressed in the early embryonic mouse optic vesicle or cup (10). Therefore, any early eye-specific enhancer(s) are located farther upstream, in an intron, or downstream (or combination thereof). Public domain epigenetic and chromatin accessibility datasets support this idea. Identifying the gene regulatory logic for Hes1 expression in the eye will be an exciting future story, well beyond this manuscript. We are excited to use live imaging of enhancer reporters to discern oscillating versus sustained activity patterns during early ocular development.

      Regarding the conclusion on new branchpoints on photoreceptor and amacrine cell genesis, a model shown in Figure 9 is still hypothetical. Figure 9B indicate a model in which the increase of Otx2+ cells and Crx+ cells in Rax-Cre; Rbpj KO is mediated by Hes1, which is presumed to be activated in Notch-independent signaling. However, Hes1 expression in the neural retina is markedly reduced in Rax-Cre; Rbpj KO (Fig. 2I), which does not fit in with the model.

      Response: We removed Fig 9B and now present new models about the Notch-dependent versus -independent roles for both Rbpj and Hes1. The new summary is Fig 8.

      So, I would like to request the authors to examine whether the increase of Otx2+ cells and Crx+ cells in Rax-Cre; Rbpj KO, (or Rax-Cre; dnMaml overexpression and Chx10-Cre; dnMaml overexpression) is inhibited by Hes1 KO.

      Response: If we understand this correctly, it would mean generating double mutants, some of which we determined are not viable (see the response above, and Suppl Table 2). Given there is only a partial knockdown of Hes1 or Hes5 in either dnMaml mutant we do not believe repeating this in the Hes1 CKO genetic background to be informative and it would take 3 generations to perform.

      Second, the authors concluded that both cone and rod genesis are enhanced in Rax-Cre; Rbpj KO by showing the data on Crx/Nr2e3 labeling in Rax-Cre; Hes1 cKO in Fig. 7BB. However, as the authors mentioned in the manuscript, Hes5 expression is elevated in Rax-Cre; Hes1 cKO (Fig. 1G). So, since Rax-Cre; Hes1 cKO has residual Hes activity in the retina, Fig. 7BB should be replaced with labeling of Crx/Nr2e3 in Rax-Cre; Hes1/3/5 cTKO.

      Response: Unfortunately, Rax-Cre;HesTKO embryos do not live past E13 (Suppl Table 2). Thus, we cannot evaluate rods, whose genesis starts around E13.5. Revised Fig 1G shows the Hes5 domain is shifted with the expansion of retinal tissue in E13.5 Hes1 single mutants, but importantly, also analogously shifted in Pax2 mutants (Fig 1H). We do not conclude that mRNA levels are "elevated" since mRNA in situ hybridization is not a quantitative technique. Our initial examination of rods in E17 Rax-Cre;Hes1 CKO mutants tested the idea of a fate shift from cones to rods. However, deeper quantification (Suppl Fig 5) do not support such a fate change.

      Furthermore, possibly, it is best to examine labeling of the retinas of Rax-Cre; Rbpj KO with rod and cone-specific markers and confirm that the number of both rods and cones is significantly increased. Third, as for defects in amacrine cells genesis in Rax-Cre; Rbpj KO, I would like to request the authors to show the data on Crx10-Cre; Rbpj KO. Although Rbpj KO is mosaic in Crx10-Cre; Rbpj KO, we can distinct Rbpj KO cells by GFP expression (Fig. S2C, C', C'). So, the authors can confirm that amacrine cell genesis is inhibited in a cell-autonomous manner in Crx10-Cre; Rbpj KO retinas but not in Crx10-Cre; dnMaml overexpression. Addition of such data will make the authors' conclusion is more convincing.

      Response: Suppl Table 1 lists multiple references (two from the NLB lab) that demonstrated both a rod and cone increase in Rbpj loss-of-function conditions. Chx10;Rbpj CKO animals were evaluated by Zheng et al., who showed an amacrine loss phenotype in these mutants (11). This is equivalent to what we see in our Rax-Cre;Rbpj CKO data, but without the complications of Chx10 mosaic Cre expression upon Rbpj deletion.

      Other comments: 1) Title of this manuscript is "Not all Notch pathway mutations are equal in the embryonic mouse retina". However, this title is quite obscure in what is research advancement of their findings. I suggest the authors to include more concrete and conclusive sentence in the title, for example "Hes and Rbpj differentially promotes retina/optic stalk boundary maintenance and photoreceptor genesis, in parallel with neurogenic inhibition by Notch signaling pathway".

      Response: We appreciate the reviewer's perspective. We are striving for a relatively simple title about a very complex topic, involving the in vivo genetic dissection of a signaling pathway. We modified the title to "Notch pathway mutations do not equivalently perturb mouse embryonic retinal development ".

      2) The "Results" section is a bit difficult to follow logics without detailed knowledge on roles of Notch signaling in mouse retinal development. I suggest the authors to improve a writing style of "Results" section for readers without such detailed knowledge on mouse Notch mutant phenotypes to follow logical flow more easily. There are many additional descriptions on research background before start to mention results. Such introductory sentences should be moved to the "Introduction" section, by which logical flow in the Results section should be simpler. In addition, the authors should show a concrete question at the beginning of each result subsection. Furthermore, the authors sometimes jump over from one result subsection and suddenly move to cite another figure panel in a far ahead subsection whose data has not been explained. Such a back-and-forth citation of figure data generally makes it difficult to follow logical flow.

      Response: We now present a considerable amount of new quantified data, reorganized multiple figures, and extensively rewrote the paper. We significantly revised the summary figure to improve clarity. In addition, Suppl Table 1 provides a wealth of background information to orient the reader on this topic. We feel that this extensive revision has greatly improved the quality, logical flow, and readability of the manuscript.

      3) In addition, figure configuration is not well organized. Each figure compared some particular marker expression in wild-type, Rax-Cre; HesTKO, Rax-Cre; Rbpj cKO, Rax-Cre; dn-Maml-GFP, Chx10-Cre; HesTKO, Chx10-Cre; Rbpj cKO, Chx10-Cre; dn-Maml-GFP. For example, Fig. 2 shows Hes1 for inhibition of neurogenesis, Fig. 3 shows Vsx2; Mitf and Pax2; Pax6 for retinal pigmented epithelium and optic stalk, Fig. 6 shows Atoh7, Rbpms, and Tubb3 for retinal ganglion cells. Fig. 7 shows Crx, Otx2, and Thrb2 for photoreceptor differentiation. Fig. 8 shows Prdm1, and Ptf1a for photoreceptors and amacrine cells. Although this figure configuration is convenient to show phenotypic difference between different genetic mutations, it is difficult to know how each differentiation steps are spatially and temporally coordinated during development. At least, I recommend the authors to show one summary figure, which shows spatio-temporal expression profile of retinal markers in wild-type mouse retinas.

      Response: We recognize this point and completely reorganized and combined Figs 6 and 7 to improve clarity. New Figure 6 presents E13 quantification for Atoh7, Otx2, Atoh7/Otx2, Rbpms and Crx expressing retinal populations. E16-E17 data were condensed and moved to a new Suppl Fig 5.

      4a) Page 7, line 7-10 "With earlier deletion using Rax-Cre, hes5 mRNA abnormally extended into the optic stalk": I wonder how the authors define the optic stalk. It is likely that optic stalk area (Pax2+, Vax1+ area) is shifted to more proximal (depart from the optic cup and move toward the brain), and neural retina is expanded accordingly (Fig. 4B, 4F), resulting in expansion of hes5 expression. Thus, it may be better to mention that optic stalk/neural retina boundary is abnormally shifted toward the brain.

      Response: The retina, including the optic nerve head, ends where the adjacent RPE terminates. This is conspicuous morphologically in our sections. We also defined this by colabeling for Pax2 and Pax6, which is now quantified in revised Fig 3. To clarify this further, we added the words " in all panels the brain is to the right" in the Fig 4 legend.

      4b) Page 8, line 14-15, "ONH/OS cells still express it (Hes1), demonstrating that sustained Hes1 is independent of Notch": I presume that Cre-Rax drives Cre in neural retina as well as optic stalk and pigmented epithelium. However, it is likely that Rbpj is not expressed in optic stalk/neural retina boundary area in wild type (Fig. S2A). No expression of Rbpj in optic stalk/neural retina boundary may support that Hes1 expression in this boundary area is Notch-independent. However, Rbpj expression is retained in some vitreal cells near optic nerve head in Rax-Cre; Rbpj-CKO retinas (Fig. S2B). What are these Rbpj+ cells? I would like to request the authors to confirm that Rbpj expression is completely absent in both neural retina and optic stalk in Rax-Cre; Rbpj-CKO mice. Otherwise, this conclusion is still not fully supported.

      Response: We show the Rax-Cre lineage in Suppl Fig 2 via the Ai9 (tomato) reporter. The results are striking, with all of the optic cup derivatives (retina, RPE, ONH, optic stalk, and presumptive ciliary tissue and iris) being tomato positive, while the well-described population of vascular cells in the hyaloid space lack tomato expression. Furthermore, our figure shows that Rbpj expression is only absent from the optic cup derivates, rather than the vascular structures in the vitreous. Vascular cells also depend on the Notch pathway and express Rbpj. Based on considerable evidence from the literature and our lineage experiments, the population of cells the reviewer highlights represents the hyaloid vasculature and associated cell types. It does not represent any population that derives from neuroectoderm.

      4c) Page 9, line 16-18, "Foxg1 had spread into the nasal optic stalk": Is Foxg1 expanded nasal area really "OS" rather than expanded retina? I suggest the authors to confirm molecular markers Pax2 expression is overlapped with Foxg1. Otherwise, it is difficult to conclude that foxg1 is expanded into the optic stalk territory, because foxg1 is normally a marker of retina. Indeed, Fig. 3K shows pax2 expression is shifted into more inside towards the brain, suggesting that neural retina is expanded. Please explain the situation.

      Response: Foxg1 (BF-1) mRNA and protein are found in the nasal retina and are expressed in other brain tissues. Multiple studies show Foxg1 in the nasal side of the E10 optic cup/retina/optic stalk and developing hypothalamus (See extra data figure Reviewer Fig 2; top row figure is data from Smith et al., 2017 (12) with Foxg1 mRNA in purple. Also see our new manuscript panel Fig 1C. We include here for reviewers (extra data Reviewer Fig 2 showing E13 ocular cryosections colabeled for Foxg1 and Pax2, highlighting their relationship in the retina, optic stalk and adjacent forming hypothalamus. On page 9 the text now reads "At E13.5 Rax-Cre;HesTKO eyes, the Foxg1 nasal retinal domain was contiguous with the nasal optic stalk (Suppl Fig 4D). This is reminiscent of younger stages (Fig 1C), since normally at E13.5, Foxg1 in the nasal optic cup/retina is separated from expression in the ONH/OS (Suppl Fig 4A). Based on the expansion of Pax6, Vsx2 and Hes5 RPC domains into the optic stalk, we conclude that the change in Foxg1 similarly reflects an extension of retinal tissue."

      4d) Page 10, line 4-5, In Rax-Cre; Hes1/3/5 cTKO eye, this tissue (RPE) extended into the optic stalk": This description seems to be incorrect. A part of Pax2 area, which is adjacent to the neural retina, contacts with RPE in wild type (Fig. 3AH), so most of RPE covers the neural retina even in Fig. 3DK.

      Response: We disagree with the reviewer’s interpretation. Fig 3D shows Mitf labeling of RPE nuclei. Figure 3K shows the adjacent section labeled with Pax2 and Pax6 (labels both retina and RPE). As the retina extended "towards the brain", the RPE analogously extends and surrounds the retinal domain. We also added higher magnification data panels 3H, 3K and 3N, showing merged and single channels.

      4e) Page 10, line 22-23, "For Chk10-Cre; Hes1/3/5 cTKO, there was a unique presence of ectopic Pax2 within the retinal territories": I wonder if this description is correct. I suspect that proliferative Pax2+ cells expand into regressing territory of Hes KO retinal cells, which undergo precocious neurogenesis and lose proliferative activity, in Chk10-Cre; HesTKO. In this case, it is possible that the Pax2/Pax6 interface may be maintained. Please show red and green channel panels for Fig. 3N to confirm that there is ectopic pax2 and pax6 double positive cells.

      Response: New quantification in revised Fig 3 (see panels O,P) fully supports our original conclusion. Only Chx10-Cre;HesTKO mutants have a statistically significant increase in Pax2+ cells. There are not more Pax2+Pax6+ double labeled cells. Only this particular genotype has an increase in Pax2+ single labeled cells.

      5a) Page 11, line 20-25. There seems to be inconsistency between result description and image data of Fig. 5A-G, and histogram Fig. 5O. Authors mentioned that a modest loss of pH3+ cell fraction in Chx10-Cre; Hes1/3/5 cTKO but not in Rax-Cre; Hes1/3/5 cTKO. However, Fig. 5D indicates severe reduction of pH3+ cell fraction in Rax-Cre; Hes1/3/5/ cTKO, which is similar to reduction of pH3+ cell fraction in Rex-Cre; Rbpj (Fig. 5B), but histogram data is different (Fig. 5O). Furthermore, pH3+ cell fraction is severely reduced in Chx10-Cre; ROSA(dn-Maml-GFP) (Fig. 5F) and modestly reduced in Chx10-Cre; Hes1/3/5 cTKO (Fig. 5G). However, pH3+ cell fraction seems to be normal in Chx10-Cre; Rbpj (Fig. 5E). These Chx10-Cre image data do not match the histogram of Fig. 5O. Please check their situation.

      Response: Images in old Figs 5-8 were normalized using area measurements, see methods and above comments (note: old Figs 6&7 were combined into new Fig 6). One-way ANOVA with pairwise comparisons for each mutant genotype compared to control were calculated using Prism. All genotypes except two have a statistically significant loss of M phase cells and we discuss possibilities for this outcome (Fig 5O). A normalization method for the sampled area is an essential component of these studies since morphologic differences are apparent for particular genotypes. The quantitative data are consistent with our original conclusions.

      5b) Fig. 5H-N, P: I wonder if the stage E13 is appropriate to evaluate cell death and survival because optic cup already becomes smaller in Rax-Cre; Rbpj, Hes1/3/5 cTKO, or ROSA(dn-MAML-GFP) than in wild-type control. I suggest the authors examine more earlier stage.

      Response: While an earlier effect is possible, we only observed size differences in a subset of the genotypes. Thus, E13 serves as a critical timepoint to examine early developmental phenotypes across the totality of our mutant conditions. It is also first age when the ONH is fully formed.

      5c) Page 12, line 19-20, "all other mutants (Chx10-Cre; Rbpj, and Chx10-Cre; ROSA(dn-MAML-GFP) were unaffected (Fig. 6EF, LM, ST)": It is likely that atoh7 expressing cells are mildly decreased and neuronal marker, Tubb3 and Rbpms-expressing cells are increased in Chx10-Cre; Rbpj, and Chx10-Cre; ROSA(dn-MAML-GFP). I requested the authors to evaluate the fraction of these markers in retinal area statistically in all the cases.

      Response: As described above, we quantified Atoh7 and Rbpms nuclear expression by immunohistochemistry. We do not believe that Tubb3+ cells can be reliably quantified. Nonetheless, it is useful to qualitatively show the extent of excess neuron formation. Importantly, we observed that it is not the Atoh7 status that matters for RGC formation, rather it is the Otx2 expression status. This is in good agreement with single cell-RNA transcriptomics data from Wu et al 2021 showing that Atoh7 mRNA in all early transitional RPCs remains fairly constant and its loss does not block the formation of early RGC cell states (13). By contrast Otx2 fluctuates but remains expressed in transitional RPCs that progress to photoreceptor lineages.

      6a) Page 7, line 19 "Ectopic blood vessels protruded from the ONH (Fig. 1K, 1L)": It is difficult to see blood vessel structures in these panels (Fig. 1I-L). Please show some molecular marker of blood vessels to confirm how blood vessel is organized in Hes1/3/5 cTKO.

      Response: These vascular structures are highly conspicuous by morphology in the H&E insets. Nonetheless, we used adjacent P21 sections to immunolabel for Endomuscin (14) and Tubb3 antibodies. This colabeling confirms the morphology and position of ectopic blood vessels in the abnormal tissue masses in Chx10-Cre;HesTKO mutant eyes. Ectopic tissue contains only rare Tubb3+ cells or cell processes suggesting it is overwhelmingly nonneural. All P21 data were moved to a new Suppl Fig 2. A full detailing of vascular phenotypes is beyond the scope of this manuscript and, interestingly, would be potentially attributable to non-autonomous effects of perturbing the Hes genes in the adjacent retina.

      6b) Fig. 5: Increase of pH3 fraction indicates several possibilities, for example (1) increased fraction of mitotic cells due to precocious neurogenesis, (2) increased fraction of mitotic cells due to activated cell proliferation of retinal progenitor cells, (3) increased cell-cycle arrest in M phase due to some stress response of progenitor cells. So, I suggest the authors to examine (1) BrdU percentage of retinal section area, (2) the percentage of pH3+ cells in PCNA+ retinal cells.

      Response: The data listed in Suppl Table 1 presents a unified picture that disrupting Notch signaling reduced proliferation. This paradigm extends to other model organisms (e.g., Drosophila, chick, frog, zebrafish and even to nonneural tissues). We included the phospho-histone H3 staining so readers would see how the six mutants evaluated in this study align with this paradigm, providing confidence for the novel findings in other figures. A full evaluation of cell cycle kinetics is interesting, but beyond the scope and focus of this manuscript.

      6c) Fig. 5: It is better that cell death fraction will be evaluated by TUNEL and labeling with anti-activated caspase 3 antibody.

      Response: We disagree. The DNA repair enzyme PARP is inactivated upon cleavage by activated caspase 3. There are currently ~3,600 citations that use it as a marker of apoptosis. PARP also has a separate and very specific role in maintaining the integrity of sperm DNA. This antibody works on all metazoans and is amenable to many tissue preparations and fixatives, making it easy to use, robust and quantifiable.

      7a) Please show red channel (Hes1) image in Fig1BC.

      Response: This was added to Revised Fig 1 (Fig 1A).

      7b) Fig. 1DH should be shown in neighbor. Fig. 1H should be assigned as Fig. 1E.

      Response: The new Fig 1 layout addresses this point.

      7c) Fig. S2D, F, H, J: Please show GFP green channel as well. Otherwise, it is difficult to see non-overlapping expression in optic stalk area.

      Response: In the revision, this is Suppl Fig 3. Chx-10-Cre is not expressed by ONH-OS cells (1). The green and fuchsia overlap (coexpression) in RPCs is white, we feel this is fairly clear. If needed, all readers can turn on and off the green channel in the final PDF version of this figure to compare GFP with Hes1 expression for those panels.

      7d) Fig. 9B: It is better to show Rax-Cre: Hes1/3/5 TKO rather than Rax-Cre: Hes1 cKO. 7e) Fig. 9B: Lettering "Rbpj mutant" should be revised as "Rax-Cre: Rbpj KO".

      Response: Fig 9B was removed so these terms are now irrelevant. Our models are presented in new Fig 8.

      Significance: The senior author of this manuscript, Dr. Nadean Brown, is an expert scientist who has investigate the role of Notch signaling pathway in vertebrate ocular tissue, including the neural retina and lens. In general, Notch signaling pathway consists of signaling stream from the interaction of Delta and Notch, Notch receptor activation by proteolytic cleavage, translocation of Notch intracellular domain (NICD) into nucleus, formation of transcription factor complex consisting of NICD/Rbpj/Maml, to the transcriptional activation of Notch target genes, Hes family transcription factors. Finally, Hes suppresses neurogenic program and maintain a pool of neural progenitor cells. Therefore, Notch is a key factor to regulate the balance between neurogenesis and progenitor proliferation. In this manuscript, the authors investigated retinal phenotypes in the knockout mice of different Notch signaling components, including Rbpj, Maml, and Hes. They found that functions of these three factors are not always equal in retinal cell differentiation; rather, they specifically regulate a particular step of retinal development. The authors propose the possibility that each of Notch signaling components may be modified by other signaling pathways and achieve some new roles beyond the conventional frame of classic Notch signaling pathway. In this point, this work has a potential to provide a new conceptual advance in the field of developmental and cell biology.

      We fully agree this work is a significant advance for the fields of developmental and cell biology. Our findings provide new information and stimulate fresh ideas for anyone working on signal transduction and signal integration.

      References cited:

      1. Bosze et al., 2020 Journal of Neuroscience Vol 40:1501-13; Bosze et al. 2021 Dev Biol Vol 472:18-29.
      2. Han et al., 2023 Development Vol 150 dev201408.
      3. Kopan and Ilagan, 2004 Nat Rev Cell Biol. Vol 5:499-504
      4. Hirata et al., 2002 Science Vol 298:840-3
      5. Friedmann and Kovall, 2010 Protein Sci. Vol 19:34-46
      6. Ong et al., 2006 JBC Voll24:5106-19
      7. Wall et al., 2009 J Cell Biol. Vo 184: 101-12.
      8. Javed et al., 2023 Development Vol 150:dev200436
      9. Matuda and Cepko 2007 PNAS Vol 104: 1027-1032
      10. Ohtsuka et al., 2006 Mol. Cell Neurosci. Vol 31:109-22
      11. Zheng et al., 2009 Molecular Brain Vol 2:38
      12. Smith et al., 2017 Journal of Neuroscience Vol 37:7975-93.
      13. Wu et al., 2021 Nature Communications Vol 12:1465: doi 10.1038/s41467-021-21704-4
      14. Saint-Geniez et al., 2009 IOVS Vol 50: 311-21.
    1. Author Response

      Reviewer #2 (Public Review):

      Associative learning assigns valence to sensory cues paired with reward or punishment. Brain regions such as the amygdala in mammals and the mushroom body in insects have been identified as primary sites where valence assignment takes place. However, little is known about the neural mechanisms that translate valence-specific activity in these brain regions into appropriate behavioral actions. This study identifies a small set of upwind neurons (UpWiNs) in the Drosophila brain that receive direct inputs from two mushroom body output neurons (MBONs) representing opposite valences. Through a series of behavioral, imaging, and electrophysiological experiments, the authors show that UpWiNs are differentially regulated by the two MBONs, i.e., inhibited by the glutamatergic MBON-α1(encoding negative valence) while activated by the cholinergic MBON-α3 (encoding positive valence). They also show that UpWiNs control the wind-directed behavior of flies. Activation of UpWiNs is sufficient to drive flies to orient and move upwind, and inhibition of UpWiNs reduces flies' upwind movement toward the source of reward-predicting odors (CS+). These results, together with existing knowledge about the function of the mushroom body in memory processing, suggest an appealing model in which reward learning decreases and increases the responses of MBON-α1 and MBON-α3 to the CS+ odor, respectively, and these changes cause UpWiNs to respond more strongly to the CS+ odor and drive upwind locomotion. Interestingly, in the final part of the results, the authors reveal a wind-independent function of UpWiNs: increasing the probability that flies will revisit the site where UpWiNs were activated. Thus, UpWiNs guide learned reward-seeking behavior with and without airflow. Although the mushroom body has been extensively studied for its role in learning and memory, the downstream neural circuits that read the information from the mushroom body to guide memory-driven behaviors remain poorly characterized. This study provides an important piece of the puzzle for this knowledge gap.

      Strength

      1) Memory studies have predominantly relied on binary choice (go or no-go) assays as measures of memory performance. While these assays are convenient and efficient, they fall short of providing a comprehensive understanding of underlying behavioral structures. In an effort to overcome this limitation, the current study used video recording and tracking software to delve deeper into memory-guided behavior. This innovative approach allowed the authors to uncover novel neurons and examine their contribution to behavior with a level of detail not possible with binary choice assays.

      2) This study used electron microscopy-based Drosophila hemibrain connectome data to reveal the synaptic connection between UpWiNs and MBON-α1 and MBON-α3. Using this method, the study shows that a single UpWiN receives direct input from both MBON-α1 and MBON- α3, which is confirmed by a functional imaging experiment. The connectome dataset also reveals several neurons downstream of UpWiNs, opening avenues for further research into the neural mechanisms linking memory and behavior.

      Weakness

      1) The authors repeatedly state in the manuscript that MBON-α1 and MBON-α3 convey appetitive or aversive memories, respectively. This assertion may not be entirely accurate. Evidence from sugar reward conditioning experiments suggests that MBON-α3 is potentiated and required for sugar reward memory retrieval. Therefore, the compartmentalization for appetitive and aversive memories appears not as obvious at the level of MBONs.

      What we intended was that activation of DANs in these compartments can induce aversive and appetitive memories, respectively, when paired with odors, and that these are the sole output pathway from these compartments to read out the memories in these compartments. As we previously proposed (Aso et al., 2014a eLife), these MBONs can integrate inputs from MBONs of other compartments and their activity can reflect appetitive memory stored as synaptic plasticity in other compartments. Since DANs in the α3 compartment respond to heat, bitter and electric shock but not sugar, the observation that MBON-α3 acquires an enhanced CS+ odor response after appetitive conditioning is presumably due to these intercompartmental connections rather than plasticity of KC-MBON synapses in the α3 compartment. In any case, the fact that excitatory activity of MBON-α1 and MBON-α3 conveys opposite valence of memory still holds true since appetitive conditioning induces depression and potentiation of odor responses, respectively.

      To clarify this point, we now cited related literature in the following sentence in the final paragraph of Introduction: “UpWiNs receive inputs from several types of lateral horn neurons and integrate inhibitory and excitatory inputs from MBON-α1 and MBON-α3, which are the output neurons of MB compartments that store long-lasting appetitive or aversive memories, respectively (Aso and Rubin, 2016; Ichinose et al., 2015; Jacob and Waddell, 2022a; Pai et al., 2013; Yamagata et al., 2015).”

      2) This study did not conclusively establish the importance of the MBON-α1/α3 to UpWiN pathways in memory-driven behavior. In the experiments shown in Figure 5, flies were trained to associate the activation of reward-related DANs with a specific odor (CS+). After conditioning, UpWiNs were observed to show enhanced responses to the CS+ odor. However, the results should be interpreted with caution because the driver line used to activate DANs (R58E02-LexAp65) labels not only DANs projecting to the MBON-α1 compartment, but all DANs in the protocerebral anterior medial (PAM) cluster. Thus, it remains unclear to what extent the observed enhanced responses are influenced by changes in inhibitory inputs from MBON-α1. While UpWiNs have been shown to play a critical role in the expression of sugar reward memory (Figure 7), it should be noted that UpWiNs receive inputs from multiple upstream neurons, making it difficult to accurately assess the contribution of MBON-α1/α3 to UpWiN pathways in UpWiN recruitment. Further research is needed to fully address this issue.

      We totally agree with this point and added a sentence to explain an alternative mechanism. “This enhancement of CS+ response can be most easily explained as an outcome of disinhibition from MBON-α1 whose output had been decreased by memory formation; MBON-α1 is inhibitory to UpWiNs (Figure 4B) and MBON-α1 response to the CS+ is reduced following the same training protocol (Yamada et al. 2023). In addition to such a mechanism, plasticity in the β1 compartment may contribute to the enhanced CS+ response in UpWiNs because the driver R58E02 contains DANs in the β1 and glutamatergic MBON from the β1 directly synapse on the dendrites of MBON-α1 and MBON-α3. “

      3) UpWind neurons (UpWiNs) were so named because their activation promotes upwind locomotion. However, when activated in the absence of airflow, flies show increased locomotor speed and an increased probability of revisiting the same location (Figure 7 and Figure 7-figure supplement 1). The revisiting behavior can be observed during the activation of UpWiNs, which is distinct from the local search behavior that typically begins after a reward stimulus is turned off (e.g., Gr64f-GAL4 results in Figure 7-figure supplement 1).

      Return probability was calculated within a 15-s time window. High return probability during LED ON period (10-20s) in Figure 7-figure supplement 1 does not necessarily mean that flies returned during LED ON period. If a fly is at the position A when t=10s, to be counted as “returned”, it needs to move more than 10mm away from A and move back to the position less than 3mm distance from A by t=25s. In the case of sugar sensory neuron activation with Gr64f-GAL4, the peak of return probability is shifted toward a later time point because flies stop and extend proboscis during activation period.

      Because revisiting a location can also be a consequence of repeated turns, it seems more accurate to describe UpWiNs as controlling the speed and likelihood of turns and promoting upwind movement by integrating with neurons that sense the direction of airflow.

      The return probability plotted in Figure 7E is probability of return to the position at the end of LED period within 15s post LED period when angular speed of SS33917>CsChrimson and SS33918>CsChrimson flies are identical to empty-split-GAL4>CsChrimson control flies (Figure 7-figure supplement 1). Thus, revisiting behavior cannot be explained by a simple increase in turing probability.

      Although functions of UpWiNs are not limited to promotion of wind-directed walking, we still think that the “UpWind Neurons” is a practical name for broad readers and oral communications at the current stage of investigations, because EM neuron IDs and names (SMP348, SMP353, SMP354, SLP399 and SLP400) are too lengthy and do not contain any functional information. We initially defined a set of 11 neurons labeled by SS33197 split-GAL4 as “UpWind Neurons (UpWiNs)” based on initial optogenetic screening (Figure 2A). We found other driver lines for mushroom body interneuron cell types that can promote release of dopamine and more robust returning phenotype (e.g. SS49755), but SS33917 remained to be the champion driver line for upwind locomotion phenotype.

      Reviewer #3 (Public Review):

      Aso et al. provide insight into how learned valences are transformed into concrete memory-driven actions, using a diverse set of proven techniques.

      Here the authors use a four-armed arena to evaluate flies' preference for a reward-predicting odor and measure upwind locomotion. This behavioral paradigm was combined with the photoactivation of different memory-eliciting neurons, revealing that appetitive memories stored in different compartments of the mushroom bodies (center of olfactory memory) induce different levels of upwind locomotion. The authors then proceed to a non-exhaustive optogenetic screen of the neurons located downstream of the output neurons of the mushroom bodies (MBONs) and identify a group of 8-11 Cholinergic neurons promoting significant changes in upwind locomotion, the UpWins. By combining confocal immunolabelling of these neurons with electron microscope images, they manage to establish the UpWins' connectome within themselves and with the MBONs. Then, using two in vivo cell recording techniques, electrophysiology, and calcium imaging, they define that UpWins integrate both inhibitory and excitatory synaptic inputs from the MBONs encoding appetitive and aversive memory, respectively. In addition, they show that the UpWins' response to a reward-predicting odor is increased after appetitive training. On a behavioral level, the authors establish that the UpWins respond to wind direction only and are not involved in lower-level motor parameters, such as turning direction and acceleration. Finally, they demonstrate that the UpWins' activity is necessary for long-term appetitive memory retrieval, and even suggest a broader role for the UpWins in olfactory navigation, as their photoactivation increases the probability of revisiting behavior. In the end, the authors state that they provide new insights into how memory is translated into concrete behavior, which is fully supported by their data. Altogether, the authors present a pretty complete study that provides very interesting and reliable data, and that opens a new field of investigation into memory-driven behaviors.

      Strengths of the study:

      • To support their conclusions, the authors provide detailed data from different levels of analysis (behavioral, cellular, and molecular), using multiple sophisticated techniques.

      • The measurement of multiple parameters in the behavioral analysis supports the strong changes in upwind locomotion. In addition, taken individually these parameters provide precise insights into how upwind locomotion changes, and allow the authors to more precisely define the role of the UpWins.

      • The authors use split-Gal4 drivers instead of Gal4, allowing them to better refine neuron labelling.

      The authors discussed and investigated all possible biases, making their data very reliable. For example, they demonstrated that the phenotypes observed in the behavioral assay were wind-directed behaviors and could not be explained by bias avoidance of the arena's center area.

      Limitations of the study:

      • In the absence of more precise drivers, the UpWins' labelling lacks precision. For example, there is no way to know exactly which UpWin is responding in the electrophysiological experiment presented in Figure 4.

      We have ongoing efforts to generate split-GAL4 and split-LexA driver lines for specific subsets of UpWiN neurons, but the data using those lines are not ready for this manuscript. However, we would like to point out that historically, identification of a group of neurons with striking phenotype has been foundational to promote follow-up studies. A good example is P1 neurons for courtship behavior.

      • The screening of neurons located downstream of the MBONs is not exhaustive, meaning that other groups of neurons might be involved in memory-driven upwind locomotion. Although, it does not diminish the authors' conclusions.

      The UpWiNs is certainly not the only one cell type for mediating memory-driven upwind locomotion, since our and other groups’ studies (e.g. Matheson et al., 2022; PMCID: PMC9360402) identified a collection of cell types that can promote upwind locomotion upon optogenetic activation.

      In 2021, we released images and driver lines of a larger collection of split-GAL4 driver lines at https://splitgal4.janelia.org. We are preparing a manuscript to provide anatomical descriptions of these lines. This collection of new drivers will help elucidate more comprehensive views of circuits for memory-driven actions.

      • All data were obtained with walking flies. So far, there have been no experiments on flying flies.

      This is an intriguing question and we mentioned in Discussion that “Our study was limited to walking behaviors, and the role of UpWiNs in flight behaviors remains to be investigated.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer # 1

      Specific comments

      1) Figure 1: it is unclear how many mice were used for the described phenotypic analyses (panels D and E). Please clarify.

      We acknowledge that we made a mistake in failing to clearly describe the phenotypic analyses. In Figure 1D and E, we performed statistical analysis on the number of TEBs in whole mammary mounts. One mouse stained a mammary whole mount with Carmine-alum staining. Thus, “n” represents the 10 mice we analyzed. We have modified the legend of Figure 1 to " D, E. Quantification of the average number of TEBs and bifurcated TEBs in littermate Crb3fl/fl (n=10) and Crb3fl/fl;MMTV-Cre (n=10) mice at 8 weeks old" in lines 909-911.

      2) Figure 2: in panels B and C it is unclear how the data was quantified; the legend states "n=10", does this mean the experiment in B was done 10 times? And that 10 acini per condition were measured in panel C? In panel D a difference in 0.3% between NC and shCRB3 seems miniscule; do the authors mean 30% instead? And how many acini were counted per condition per (how many) experiments? Same applies to panels G and H, it is unclear how many cells were analyzed per (how many) experiments.

      Thanks for your suggestions. We failed to describe the details of the statistical analysis well in the experimental method. To provide a brief overview of our statistical analysis method, we took 3-4 random bright-field micrographs of each well in the chamber slide system and repeated the experiment three times. We then counted the number of acini in all micrographs (Figure 2B) and examined the diameter of all acini in each photograph, averaging the values as data (Figure 2C). We also determined the percentage of aberrant acini in each photograph, which was used as an analysis value (Figure 2D). We carefully confirmed that the vertical axis of Figure 3D was indeed mislabeled and should mean 30%, and revised the original figure. For IF analysis of the mitotic spindle orientation during lumen formation, we examined the division angle of one cell in one acinus that was mitotically dividing, 3-4 acini were randomly examined in each well in the chamber slide system, and this experiment was repeated three times (Figure 2G and H). Therefore, we have provided a detailed description of these issues in the Figure 2 legend. The revised parts are found in lines 922-924, lines 926-927, lines 929-930, and line 932.

      3) Figure 2: it would be desirable if authors were able to quantify the data in panels E and I.

      Thank you for your comments. According to your suggestions, we performed the quantitative analysis of Figure 2E and I, which is now presented in the new Figure 2D and H.

      4) For all cell-based assays using shRNA to knock down CRB3 (Fig. 2A-H; Fig. 3A-F; Fig. 4C-E; Fig. 5G-J; Fig. 6C; Fig. 7C, D; Fig. 8E-G), it would be desirable to perform rescue experiments to ensure that the observed phenotype of CRB3 depleted cells is specific and not due to off-target effects of the shRNA.

      Yes, rescue experiments involving overexpression of CRB3 in CRB3 depleted cells can accurately account for the specific phenotype as well as eliminate the off-target effects of shRNA. However, our group has long focused on the role of the cell polarity protein CRB3 in contact inhibition and tumorigenesis. Our previous studies have ruled out the off-target effects of shRNA and reported that CRB3 regulates contact inhibition and tumorigenesis through Hippo or Wnt signaling pathways (Cell Death Dis 2017;8(1):e2546, Oncogenesis 2017;6(4):e322, J Cell Mol Med 2018;22(7):3423-33). Therefore, we will pay close attention to rescue experiments to ensure experimental integrity and phenotypic specificity in our subsequent studies.

      5) Figure 3: how many cells were counted/measured per condition (in how many experiments) in panels B, D, H, F, G and H? In panels C and D, what is the CRB3 protein level in these cells? This is of relevance as protein overexpression per se could impinge on ciliation frequency. This question could be addressed by performing a western blot analysis with CRB3 antibody.

      We did not clearly describe the measurement and statistical analysis methods in the previous manuscript. Similarly, we took 3-4 random IF and SEM micrographs of each sample in one experiment, and this experiment was repeated three times. Subsequently, the number of ciliated cells and total cells were counted, and the proportion of ciliated cells was calculated (Figure 3B, D and F). In these figures, the cilium length of representative ciliated cells was measured in each photograph. In the knockout mouse model, we needed to find the intact mammary ductal lumen and renal tubule in IF staining of mouse mammary and renal tissue sections, with 5-6 random fields micrographs taken per slice, and the proportion of ciliated cell was measured by counting and taking the average. A total of ten mice were repeated in these experiments (Figure 3G and H). Therefore, the legend of Figure 3G and H has been partially modified and a detailed description has been added to the Figure 3 legend. The revised parts are in lines 945-946, lines 950-951, line 953.

      Thank you for your suggestions that we perform a western blot analysis with CRB3 antibody in Figure 3C and D. And we have added the western blotting with CRB3 analysis in the new Supplementary Figure 3A.

      6) Figure 3G: it is very difficult to see that the red stained structures are primary cilia.

      Yes, the staining structure of primary cilia in mammary ductal lumen are less clear than that of individual cells and in renal tubule in Figure 3G. We used recognized acetylated tubulin and γ-tubulin to stain the primary cilia, which were clearly labeled in individual cells. However, the labeled primary cilia in renal tubule were longer length and demonstrated a more pronounced structure than those in the mammary ductal lumen. In the mammary ductal lumen of the 10 mice we analyzed, the primary cilia showed shorter length and staining structure than the others shown in Figure 3G. This difference may be due to the distinct characteristics of primary cilia in different tissues.

      7) Figure 4B: how many cells were analyzed in how many experiments?

      Our statistical methods for analyzing cellular experiments using IF were essentially the same. We randomly selected 3-4 IF micrographs of each sample in one experiment, and this experiment was repeated three times. Subsequently, the number of colocalization cells and total cells were counted, and the proportion of cells with pericentrin and CRB3 colocalization was calculated (Figure 4B). The detailed description has been added to the Figure 4 legend. The revised part is in lines 962-963.

      8) Lines 217-219: since the cells were not stained with a cilia marker, only a centrosome marker, the claim that CRB3 localizes to the base of cilia is unsubstantiated.

      Thank you for your comments. The base of cilia is the basal body, which develops from the mother centriole of the centrosome (Cancer Res. 2006;66(13): 6463-7). Firstly, we found colocalization of CRB3 and pericentrin, a centrosome marker, in MCF10A cells (Figure 4A and B). Secondly, we verified the colocalization of CRB3 with γ-tubulin, a marker of basal body in primary cilia, in confluent quiescence cells (Figure 4C and D). In addition, we found that CRB3 was localized at the base of primary cilia labeled with acetylated tubulin (Figure 4E and F). Due to the species of commercialized CRB3 antibody, we were able to indirectly claim that CRB3 localizes to the base of cilia through these experiments.

      9) Figure 3 and Figure 4: is it problematic to use gamma tubulin as centrosome marker if CRB3 depletion causes reduced centrosomal recruitment of gamma tubulin ring complex components? Also, in Figure S3A no gamma tubulin staining can be seen in the lower panel, why?

      Thank you for your positive comments. As is well known, γ-tubulin is a marker of the centrosome, and we found that CRB3 depletion causes reduced centrosomal recruitment of gamma tubulin ring complex components. However, Our Figure 3 was illustrated the effect of CRB3 on ciliary assembly, and Figure 4 was analyzed the localization of CRB3 in primary cilia. In some reports on ciliary assembly, the fluorescent double staining of acetylated tubulin and γ-tubulin have been used to label primary cilia, and the effect of target genes on ciliary number and assembly were analyzed by these markers (Nature. 2013;502(7470): 254-7, Cell. 2007;130(4): 678-90 and so on). Although CRB3 affects the recruitment of gamma tubulin ring complex components, it does not affect the analysis of ciliary number and localization in Figures 3 and 4.

      In Figure S3A, green staining labeled with γ-tubulin could be clearly found in the lower left panel. The representative area from the left amplification may have been poorly selected, resulting in no γ-tubulin staining on the right side. We have updated the lower right panel in the new Supplementary Figure 3B.

      10) Figure S4A: the grouping of indicated proteins is factually wrong. For example, FBF1, SCLT1 and ODF2 are not IFT-B components, and several of the proteins indicated as localizing to the basal body also localize to (unciliated) centrioles. In contrast, CP110 is usually only found on unciliated centrioles and not mature basal bodies. Authors should consult the relevant literature and correct the figure accordingly. Alternatively, this misleading text/grouping could be removed from the figure. Furthermore, in the legend to Figure S4 there is no information provided about this quantitative analysis (how many independent experiments, which cells were analyzed etc.).

      Thank you for your helpful suggestions. We have taken your advice and removed this misleading information from the manuscript, Supplementary Figure 4A and its corresponding legend. In the legend to Supplementary Figure 4A, we have added the detailed information for this quantitative analysis in the legend. The revised legend is shown in lines 1098-1100.

      11) Figure S4B: how do authors know which of the bands correspond to CRB3 fusion protein?

      Based on the construction strategy of the CRB3-GFP fusion protein (Figure 6D) and its base sequence, we were able to calculate its molecular weight. Then the molecular weight of CRB3-GFP fusion protein was verified by western blotting (Figure 6F and 7A). Meanwhile, exogenous overexpression allowed for the production of the CRB3-GFP fusion protein in large quantities. Due to these features, we could know that the band indicated by the black arrow is most likely CRB3-GFP fusion proteins. In order to check the molecular weight, we have labeled the key molecular weight markers in the new Supplementary Figure 4B.

      12) Lines 251-253: this seems like data overinterpretation.

      Thank you for your comments. We have revised this sentence in lines 252-254.

      13) Lines 260-261: the data showing perturbed gamma tubulin localization is not convincing as data was not quantified.

      According to your suggestions, we performed the quantitative analysis of Figure 4C, which is now presented in the new Figure 4E.

      14) Figure 5H and Figure 6C: to show that the GCP6 IP actually worked, these blots should be probed also for GCP6.

      Thank you for your good suggestions. We have added these blots probed for GCP6 in new Figure 5H and 6C.

      15) Figure 5I: how many cells were analyzed in how many experiments?

      Our statistical methods for analyzing cellular experiments using IF were essentially the same. We took 3-4 random IF micrographs of each sample in one experiment, and this experiment was repeated three times. The detailed description has been added to the Figure 5 legend. The revised part is in lines 992-994.

      16) Figure S5: it looks like GPC6 and Rab11 are localizing all over the cell, are the antibodies used for the IFMs specific for these proteins?

      After checking the specificity of these antibodies used for the IFMs, we have decided to delete the corresponding results in the Supplementary Figure 5 and their description in the original manuscript.

      17) Lines 43, 89, and 314-315: the claim that CRB3 directly binds Rab11 is not supported by the data. The data provided only shows that these proteins interact indirectly. To show direct interaction, yeast-2-hybrid analysis or pull-down assays with purified proteins would be required.

      Thank you for your positive comments. Since we were unable to complete the relevant experiments to demonstrate direct interaction of two proteins, we have revised our conclusions. Replace " CRB3 directly binds Rab11" with " CRB3 binds Rab11" in the manuscript.

      18) Figure 6G and lines 314-315: this result is surprising as it indicates GTP- and GDP-locked versions of Rab11 have the same inhibitory effect on CRB3 binding? Please comment, and also indicate how data in Figure 6G was quantified (and how many independent experiments were used for the quantification).

      We were also puzzled by the results shown in Figure 6G. Based on the western blotting bands, we suspected that there may have been some issues with the experiment. Specifically, we believed that the inefficient transfection of Flag-Rab11aWT, Flag-Rab11a[Q70L], Flag-Rab11a[S20V], and Flag-Rab11a[S25N] plasmids, as well as the insufficient amount of GFP antibody used in the co-IP experiment, led to the corresponding bands being too weak and masking the true differences.

      To address this, we optimized the experimental conditions, strictly increased the experimental control, and repeated the experiment in triplicate. The new results are shown in the revised Figure 6G. The statistics from the three independent experiments revealed that CRB3b had a stronger interaction with Rab11a[Q70L] and Rab11a[S20V], while showing a weaker interaction with Rab11a[S25N], compared to Rab11aWT. As this result, we revised the original manuscript in lines 308-310 and added a detailed description to the Figure 6 legend in lines 1012-1013.

      19) Figure 8G: data needs to be quantified.

      Thank you for your comments. We replaced the unattractive bands in the western blotting of Figure 8G with better quality ones. The statistical analysis of the Figure 8G data is shown in Supplementary Figure 6.

      Further minor comments

      1) Abstract should indicate that this study describes conditional knockout of Crb3 in mouse mammary gland epithelial cells.

      This is good writing advice. We have added the relevant description in lines 40-42.

      2) Line 87: specify which gland (mammary?).

      We have modified to " mammary gland" in line 87.

      3) Line 140: sentence states that knockout of Crb3 is essential for branching morphogenesis in mammary gland development, I do not think this is correct.

      We have removed the inappropriate finding.

      4) Line 152: "formed more number" should be "formed more" or "formed higher number of".

      We modified "formed more number" to "formed more" in line 154.

      5) Lines 157-163: text and logic are difficult to follow for a non-expert.

      We have modified the logic of this paragraph, as detailed in lines 158-165.

      6) Figure 4A, C: figure resolution could be improved. It is difficult to see what the authors claim these figures are showing.

      The clarity of the original images in Figure 4A and C is acceptable, while the images on the right are electronically enlarged. Although there is a decrease in pixels, it can still display our findings.

      7) Figure 7D, E: images look pixelated.

      The clarity of the original images in Figure 7D and E is acceptable using a laser confocal microscope, while the images on the right are electronically enlarged.

      8) Line 222: unclear what authors mean by "detected a series".

      We modified "detected a series" to "some important" in line 226.

      9) Lines 221-225: which cells were used for the analysis in Fig. S4?

      We used MCF10A cells for the analysis in Supplementary Figure 4, and modified its legend in line 1098.

      10) Line 245: what is "cytomembrane"?

      We modified "cytomembrane" to "cell membrane" in lines 246-247.

      11) Lines 246-250: wording is unclear/difficult to understand.

      We have modified this paragraph, as detailed in lines 248-251.

      12) Line 273: should "regimented" be "sedimented"?

      We modified "regimented" to "sedimented" in line 274.

      13) Line 287-288: sentence does not make sense.

      We have removed this sentence.

      14) Figure 5A: it would be desirable to show the original dataset (Excel file) used for generating this figure.

      To maintain data integrity, we should provide the original dataset (Excel file). However, there are some unpublished data in this file that we must withhold for the time being. If needed, the corresponding author can be requested to provide the file.

      15) Lines 298-299: wording is unclear.

      We have modified this sentence, as detailed in lines 296-298.

      16) Lines 285-287: replace "instead of" with "but not".

      We modified "instead of" to "but not" in line 286.

      17) For all IFMs showing merged images of the green and red channel, please also show the red and green channel separately.

      Most of our fluorescence images are presented separately for each channel in this manuscript, with only a few merged images due to space limitations. This type of presentation is commonly used in published papers.

      18) Lines 326 and 327: replace "bonded" with "bound".

      We have modified in lines 322-323.

      19) Lines 327-328 and 361-364: wording is unclear/grammatically incorrect.

      We have modified these paragraphs, as detailed in line 323 and lines 357-360.

      20) Line 342: what is meant by "the combination of"?

      We modified "the combination of" to "the binding of" in line 338.

      21) Line 365: localization of what?

      This means "subcellular localization" in lines 360-361.  

      Reviewer # 2

      Major points

      1) CRB3 is present in mammals as 2 isoforms, A and B, originating from alternative splicing. In this study, the authors never mention this fact and when using approaches to KO or KD CRB3A/B they are likely to deplete both isoforms which have been shown to have different C-terminal domains and functions (Fan et al., 2007). This is also important for the CRB3 antibodies used in the study since according to the material and methods section they are either against the extracellular domain common to both isoforms or the intracellular domain which is only similar in the domain close to transmembrane between the 2 isoforms. Since the antibodies used in each figure are not detailed it is impossible to know if the authors are detecting CRB3A or B or both. Please provide the information and correct for the actual isoform detected in the data and conclusions.

      Thanks for your positive comments. In mammals, CRB3 has two isoforms, CRB3a and CRB3b, distinguished by alternative splicing within the fourth exon of the CRB3 gene, which in turn produces a protein with 23 amino acid differences at the C terminus. Both CRB3a and CRB3b have mostly identical amino acid sequences, and have indistinguishable molecular weight sizes. As a result, the knockout mouse construction strategy and the design principles of RNAi sequences target both CRB3a and CRB3b. This is described in lines 100-104 and lines 149-150. Additionally, commercially available antibodies detect both CRB3a and CRB3b, as mentioned in line 123 and lines 636-637 in revised manuscript.

      However, it should be noted that our CRB3 overexpression, as shown in the CRB3 structural domain in Figure 6D, refers specifically to the sequence of CRB3b. As a result, we have updated the original manuscript as well as the legends of Figures 3C, 3E, 4A, 5A, 5B, 6D-G, 7A, 7B and Supplementary Figure 2F-H, 3A, 4B, 6B to reflect this change. All instances of overexpressed CRB3 have been changed to CRB3b.

      2) CRB3A and B have been localized in the cilium itself (Fan et al., 2004; 2007) but in the study CRB3A/B does not enter the cilium but is localized in the basal body (figure 4). How the authors reconcile these different localizations?

      Indeed, we found that CRB3 is mainly localized at the basal body of the primary cilium, which differs from previous reports in the literature (Curr Biol. 2004;14(16):1451-61 and J Cell Biol. 2007;178(3):387-98). However, upon closer examination of one of these reports (Curr Biol. 2004;14(16):1451-61), it appears that CRB3 was actually scattered on the primary cilia, with a strong focus at the basal body. Additionally, in rat kidney collecting ducts, the localization of CRB3 on primary cilia was significantly reduced, with obvious localization at the basal body. Another study (J Cell Biol. 2007;178(3):387-98) also reported the co-localization of CRB3b and γ-tubulin in MDCK cells, which is consistent with our conclusion. We further verified the co-localization of CRB3 with the centrosome by overexpressing CRB3b in mammary epithelial cells, indicating that CRB3 mainly localizes to the basal body of the primary cilium. This information is discussed in the Discussion section of the manuscript (lines 400-410).

      3) The authors use GFP-CRB3A/B, it is not stated which isoform, over-expression to localize CRB3A/B in MCF10A cells (figure 4A). The levels of expression appear to be very high in the GFP panel and it is likely that the secretory pathway of the cells is clogged with GFP-CRB3A/B in transit from the ER to the plasma membrane. Thus, the colocalization with pericentrin might be due to the accumulation of ER and Golgi around the centrosome. This colocalization should be done with the endogenous CRB3A/B and with a better resolution.

      Thank you for your comments. We were also interested in the co-localization of endogenous CRB3 and centrosome proteins. However, the only commercial CRB3 antibody available is the rabbit species, and the pericentrin antibody (Abcam, ab4448) that is very useful is also the rabbit species. We had difficulty finding commercial centrosome-associated antibodies for other species. Therefore, we examined the co-localization of endogenous CRB3 with γ-tubulin in Figure 4C and combined the results with those of exogenous CRB3 to illustrate the co-localization of CRB3 with centrosomes.

      4) The staining for CRB3A/B in figure 4C (red) is striking with a very strong accumulation in an undefined intracellular structure and the authors do not provide any explanation for such a difference with the GFP-CRB3A/B just above.

      Thank you for your good suggestions. The immunofluorescence images of GFP-CRB3 in Figure 4a were obtained using a fluorescence microscope, while the images of endogenous CRB3 were obtained using a laser confocal microscope. The fluorescence microscope excites a fluorescent dye to emit a signal, which is amplified into a visible light signal and presents a full fluorescent signal. In Figure 4a, we can clearly see the full distribution of exogenous CRB3 in MCF10A cells, including its tight junctional localization consistent with previous reports in the literature and its co-localization with centrosomal proteins. On the other hand, laser confocal microscopy uses a laser as the light source to excite the fluorescence within the sample point by point. It employs a precision pinhole filtering technique with strong laminar imaging capabilities. In the specific analysis of endogenous CRB3 co-localization studies with centrosomes and primary cilium, signals at tight junctions must be excluded. Therefore, Figure 4c represents the fluorescence signal at the level of intracellular CRB3 co-localization with γ-tubulin. The two methods use different detection means and techniques, and are not directly comparable.

      5) The staining in figure 4E is also different from those shown in figure 4F in which the CRB3A/B staining is right at the base of the axoneme while it is not the case in figure 4E where we can see a red dot close to but not right at the base of the axoneme.

      Thank you for your comments. The new Figure 4F displays the localization relationship between CRB3 and primary cilium, analyzed using laser confocal microscopy. With the unique single-level detection function of this microscope, the problem of level selection may cause the red dots to appear close to, rather than right at the basal body of the primary cilium. However, the new Figure 4G, based on the use of 3D reconstruction scanning technique, clearly demonstrates the localization of CRB3 at the basal body of the primary cilium under the same cells and conditions.

      6) The authors claim that CRB3A/B interacts directly with Rab11 but they only show co-immunoprecipitation experiments from cell lysates which do not support direct interactions. The only way to show a direct interaction is to produce both proteins in vitro. Thus, the term direct interaction should be removed.

      Thank you for your positive comments. Since we were unable to complete the relevant experiments to demonstrate direct interaction of two proteins, we have revised our conclusions. Replace " CRB3 directly binds Rab11" with " CRB3 binds Rab11" in the manuscript.

      7) In addition, the authors claim (Line 251/252) that Rab11 is necessary for the transport of CRB3A/B but they should KD Rab11 to show this.

      Thank you for your good suggestions. It is essential to observe CRB3 trafficking after knockdown Rab11. However, in Figure 5C, we used the endocytosis inhibitor dynasore, which also inhibits Rab11-positive endosomes. This result shows that dynasore can significantly inhibit CRB3 trafficking in MCF10A cells. We believe that this experiment partially demonstrates that inhibiting Rab11 function can affect CRB3 trafficking.

      8) The domain of CRB3A/B that is necessary for the interaction with Rab11 is the N-terminal part of the extracellular domain. This domain is thus inside the transport vesicles and not accessible from the cytoplasm. Given that Rab11 is a cytoplasmic protein, how the 2 proteins could interact across the membrane? The authors do not even discuss this essential point for their hypothesis.

      Thank you for your positive comments. As shown in the schematic model in Figure 9, we believe that when cells form tight junctions, CRB3 is primarily located on the cell membrane. Subsequently, endosomes are involved in the intracellular degradation process of CRB3 on the cell membrane. Intracellular CRB3 can bind to Rab11 through the extracellular domain, which in turn participates in primary cilia assembly. We have made detailed modifications to lines 418-421.

      9) Figures are not numbered.

      Thank you for your comments. We have updated the numbers in the original manuscript as well as the legends of Figures 1D, 1E, 2B, 2D, 2F, 2G, 3B, 3D, 3F-H, 4B, 4E, 5I, 6, 8G and Supplementary Figure 1E, 2, 3C, 4A, 5B, 6.

      Minor points

      1) The authors cite several studies showing that a down regulation of CRB3A/B in human cells promotes cancer but other studies show the contrary: Lin et al., 2015 for example. Please discuss these discrepancies.

      Thanks for your good suggestion. We have included additional studies with contrasting results in the discussion section, specifically in lines 378-380.

      2) Line 98: "exhibit smaller" smaller than what?

      We modified "exhibit smaller" to "exhibit smaller size" in line 97.

      3) Line 152: "form more number, ..." ???

      We modified "formed more number" to "formed more" in line 154.

      4) Line 180: "Compared with the control, the number of cells with primary cilium was significantly increased ». To me it is the contrary! This part is not clear at all. Please rewrite.

      We have revised the sentence in lines 183-185.

      5) Authors should check and review extensively for improvements to the use of English.

      Thanks for your good writing advice. We have carefully reviewed and revised the entire manuscript to improve its readability.

    1. Reviewer #1 (Public Review):

      In principle a very interesting story, in which the authors attempt to shed light on an intriguing and poorly understood phenomenon; the link between damage repair and cell cycle re-entry once a cell has suffered from DNA damage. The issue is highly relevant to our understanding of how genome stability is maintained or compromised when our genome is damaged. The authors present the intriguing conclusion that this is based on a timer, implying that the outcome of a damaging insult is somewhat of a lottery; if a cell can fix the damage within the allocated time provided by the "timer" it will maintain stability, if not then stability is compromised. If this conclusion can be supported by solid data, the paper would make a very important contribution to the field.

      However, the story in its present form suffers from a number of major gaps that will need to be addressed before we can conclude that MASTL is the "timer" that is proposed here. The primary concern being that altered MASTL regulation seems to be doing much more than simply acting as a timer in control of recovery after DNA damage. There is data presented to suggest that MASTL directly controls checkpoint activation, which is very different from acting as a timer. The authors conclude on page 8 "E6AP promoted DNA damage checkpoint signaling by counteracting MASTL", but in the abstract the conclusion is "E6AP depletion promoted cell cycle recovery from the DNA damage checkpoint, in a MASTL-dependent manner". These 2 conclusions are definitely not in alignment. Do E6AP/MASTL control checkpoint signaling or do they control recovery, which is it?

      Also, there is data presented that suggest that MASTL does more than just controlling mitotic entry after DNA damage, while the conclusions of the paper are entirely based on the assumption that MASTL merely acts as a driver of mitotic entry, with E6AP in control of its levels. This issue will need to be resolved.

      and finally, the authors have shown some very compelling data on the phosphorylation of E6AP by ATM/ATR, and its role in the DNA damage response. But the time resolution of these effects in relation to arrest and recovery have not been addressed.

      Revised manuscript:<br /> I think the authors did a good job in revising the paper, and provide compelling support for a timer function in the checkpoint. I do think they still have missed one important point how MASTL could act as a timer to control recovery. The data clearly show that MASTL somehow controls ATM/ATR activity, whilst their final model (fig.9) places MASTL upstream of CDK activity, without mentioning its feedback on ATM/ATR. I think there are 2 possible explanations for the timer function of MASTL they have discovered here, both may be relevant. The first is enhanced CDK activation by direct control of CDK phosphorylation through MASTL/B55/PP2A. The second is through MASTL-mediated shut-down of ATM/ATR activation (mechanism to be determined) which is also reported here. Their final model and discussion do not display sufficient appreciation for this latter option, and I would argue that the HU-recovery experiment shown in Fig.5B is actually in strong support of the second explanation, rather than the first.

    1. Public Review:

      In countries endemic for P vivax the need to administer a primaquine (PQ) course adequate to prevent relapse in G6PD deficient persons poses a real dilemma. On one hand PQ will cause haemolysis; on the other hand, without PQ the chance of relapse is very high. As a result, out of fear of severe haemolysis, PQ has been under-used.

      In view of the above, the Authors have investigated in well-informed volunteers, who were kept under close medical supervision in hospital throughout the study, two different schedules of PQ administration: (1) escalating doses (to a total of 5-7 mg/kg); (2) single 45 mg dose (0.75 mg/kg).

      It is shown convincingly that regimen (1) can be used successfully to deliver within 3 weeks, under hospital conditions, the dose of PQ required to prevent P vivax relapse.

      As expected, with both regimens acute haemolytic anaemia (AHA) developed in all cases. With regimen (2), not surprisingly, the fall in Hb was less, although it was abrupt. With regimen (1) the average fall in Hb was about 4 G. Only in one subject the fall in Hb mandated termination of the study.

      Since the data from the Chicago group some sixty years ago, there has been no paper reporting a systematic daily analysis of AHA in so many closely monitored subjects with G6PD deficiency. The individual patient data in the Supplementary material are most informative and more than precious.

      Having said this, I do have some general comments.<br /> 1. Through their remarkable Part 1 study, the Authors clearly wish to set the stage for a revision of the currently recommended PQ regimen for G6PD deficient patients. They have shown that 5-7 mg/kg can be administered within 3 weeks, whereas the currently recommended regimen provides 6 mg/kg over no less than 8 weeks.<br /> 2. Part 2 aims to show that, as was known already, even a single PQ dose of 0.75 mg/kg causes a significant degree of haemolysis: G6PD deficiency-related haemolysis is characteristically markedly dose-dependent. Although they do not state it explicitly in these words (I think they should), the Authors want to make it clear that the currently recommended regimen does cause AHA.<br /> 3. Regulatory agencies like to classify a drug regimen as either SAFE or NOT-SAFE; they also like to decide who is 'at risk' and who is 'not at risk'. A wealth of data, including those in this manuscript, show that it is not correct to say that a G6PD deficient person when taking PQ is at risk of haemolysis: he or she will definitely have haemolysis. As for SAFETY, it will depend on the clinical situation when PQ is started and on the severity of the AHA that will develop.

      The above three issues are all present in the discussion, but I think they ought to be stated more clearly.

      Finally, by the Authors' own statement on page 15, the main limitation is the complexity of this approach. The authors suggest that blister packed PQ may help; but to me the real complexity is managing patients in the field versus the painstaking hospital care in the hands of experts, of which volunteers in this study have had the benefit. It is not surprising that a fall in Hb of 4 g/dl is well tolerated by most non-anaemic men; but patients with P vivax in the field may often have mild to moderate to severe anaemia; and certainly they will not have their Hb, retics and bilirubin checked every day. In crude approximation, we are talking of a fall in Hb of 4 G with regimen (1), as against a fall in Hb of 2 G with regimen (2), that is part of the currently recommended regimen: it stands to reason that, in terms of safety, the latter is generally preferable (even though some degree of fall in Hb will recur with each weekly dose). In my view, these difficult points should be discussed deliberately.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      Reply to general assessment of referee #2:

      1. General assessments: The current study adds some to these observations…some of these observations are incremental…biological significance is limited. While this reviewer does not suggest additional experimentation, this manuscript would be suitable as a resource paper.

      Reply: It appears we were not clear enough in explaining the novel aspects of our study.

      The starting points are two published studies from our lab demonstrating a global increase of ISGF3 association with ISG promoters in IFNγ-treated cells and a remarkable similarity of IFN-γ and type I IFN-induced early transcriptome changes. These findings challenge the notion in the field (as mentioned by the referee) that IFNγ specificity is produced by the predominant deployment of STAT1 homodimers. We thus tested the hypothesis that the specificity of the IFNγ-induced transcriptome is generated over time, rather than during the early response, and relies on secondary responses to transcription factors such as IRF1. In contrast, IRF1 plays no or only a small role in the type I IFN response that utilises ISGF3 and/or unknown secondary factors in the delayed response. We tested this hypothesis with PRO-seq technology to rule out confounding effects of mRNA processing over a 48h period. The data are clear in showing that many genes associated with the antibacterial or anti parasite profile of activated macrophages are indeed much more abundant in late-stage rather than briefly IFNγ-treated macrophages and these delayed changes are to a large extent dependent on IRF1. Our findings are based on the best available technologies, a combination of nascent transcript analysis with genetics and protein interaction studies. In addition, our findings rule out alternative models of sustained or secondary ISG transcription, such as the employment of alternative ISGF3 complexes (such as STAT2-IRF9) or of ISGF3 complexes formed with unphosphorylated STAT1 and STAT2. We provide evidence for higher order waves of transcription caused by unknow transcription factors that are produced by transcriptional activation of ISGF3 or IRF1 target genes and identify candidates among the AP1 and Ets transcription factor families. We agree that some of the data are confirmatory rather than novel (i.e. some of the genes we describe were known from previous literature to be IRF1 targets), but it is the systems approach of our study, and particularly the delineation of conditions under which the largely neglected delayed response diverts the IFNβ and IFNγ-induced transcriptomes, that generates a comprehensive and conclusive view of IFNγ acting predominantly as a macrophage activating factor, and IFNβ being an essential antiviral cytokine. We do think this main outcome is immunologically meaningful and not incremental. For this reason, we would prefer to publish the paper as a relevant contribution to innate immunology rather than a resource. Emphasizing our point, a paper appeared in ‘Cell’ while our study was under review, showing that human IRF1 mutations cause mendelian susceptibility to mycobacterial disease (MSMD), a term coined by JL Casanova and colleagues for immunological defects that reduce the ability of macrophages to cope with intracellular bacteria (new ref. 65). This important study emphasizes the main conclusions of our study about the relevance of IRF1 for macrophage activation. We discuss this paper on p. 14 lines 9-14.

      Revision: We tried to better explain the scientific motivation for this study and the significance of the results (p. 4, lines, lines 12-25).

      Revision plan: n. a.

      2. Description of the planned revisions

      Referee #3; major comment 1:

      In Fig. 1d is difficult to interpret and misleading for many reasons. First, the cluster numbering is disconnected from the cluster order; why not numbering them based on the hierarchical clustering and writing the cluster number besides the cluster itself? Second, having a 2-color gradient is misleading; negative values shouldn't be in the same color tone than the positive values. Third, the authors did not provide adequate rationale behind using only the top 1,000 most expressed gene? Why not using all the differentially expressed genes in at least one of the condition to provide a comprehensive analysis? Could this potentially lead to bias in the data, and is there any information lost by not using the - lower - expressed genes fraction? Fourth, it is not clear what the color scale is representing and how the data was transformed. Was a mean centering of the expression values of the log2FC applied to the RNA-seq data to facilitate clustering? Mean centering and z-scoring is a common technique used to adjust expression data, but it can potentially exaggerate differences between samples. More information about the data and analysis should be provided, as it is difficult to determine whether this was a valid approach or not.

      Reply:

      • To create the heatmap, we used the pheatmap package from R and the cutree_rows option to separate 11 clusters with strikingly different patterns of gene expression based on visual exploration. The numbering was autogenerated by the program.
      • The data is now shown in red-blue.
      • We restricted our list to only 1000 genes from each comparison as we aimed to analyze the prominent patterns of gene expression across timepoints. Considering all differentially expressed genes based on a padj value would also include genes expressed at very low levels as evident from the low baseMean values obtained from DESeq2. Hence, we applied a selection of 1000 genes which effectively represented the major patterns of gene expression across timepoints.
      • Variance stabilized transformation was applied on read counts obtained from PRO-seq using the DESeq2 package. The transformed reads were z-score normalized and used for performing hierarchical clustering by the “Ward.D2” method using the pheatmap package in R. A total of 3126 genes were used for this analysis. 11 distinct clusters were defined using cutree_rows option. The color scale represents z-score normalized counts. The genes represented in the heatmap were selected based on the following criteria: each timepoint of interferon treatment was compared to the homeostatic condition (untreated sample) in wildtype BMDMs. The differentially expressed genes from each comparison were selected based on the filtering criteria: absolute log2FoldChange >=1 and adjusted p value <0.01 by Wald test. Following the differential analysis, the first 1000 differentially expressed genes in each treatment condition (ordered based on adjusted p values) were selected for both IFN types and combined and selected for creating a list which consisted of 3126 unique genes. The scale in the heatmap represents z-scores of variance-stabilized reads, calculated across all genotype and treatment conditions, separately for each IFN type.

      Revision plan: We will label the clusters with the cluster number next to it in addition to the color codes.

      Referee #3; major comment 3:

      The large standard deviation bars in the claim that ChIP data confirmed the binding of ISGF3 components to the promoter of Mx2 cast doubt on the validity of the results and conclusions. The authors should consider additional experiments or complementary analyses to validate their findings. Or alternative, to adjust their claims accordingly.

      Reply: To demonstrate sufficient quality of the data the ratio of Stat1/ Stat2 was calculated for early (1.5hrs) and late (48h) separately. The unpaired two-tailed t test comparing this ratio between 1.5 hrs and 48hs, shows that they are not significantly different. This indicates that all ISGF3 components are associated with ISG during both early and delayed responses, i. e., that STAT2/IRF9 complexes are unlikely to contribute to delayed ISG control. However, we agree with the referee that the standard deviations of the kinetic ChIP experiment are high and that it would be good to generate additional data.

      Revision plan: We will perform additional ChIP experiments to improve the statistical power of the results in fig. S2c.

      Referee #3, major comment 6:

      The authors interpret their ATAC-seq and ChIP-seq results based on a 2kb window to the TSS of genes, not considering relatively close enhancers or longer range cis-regulatory interactions in their interpretation. For example, they mention on p.7 "Contrasting the strong binding of IRF9 and IRF1 to the Mx2 (cluster 2) and Gbp2 (cluster 9) promoters, respectively, we saw no evidence for direct binding to Lrp11 (cluster 3) and Ptgs2 (cluster 10)", but on Fig 3d they show only the proximal regions. No scale bars are shown either. Moreover, exploring the same published IRF1 ChIP-seq dataset, there is a clear IRF1 binding site at the promoter of Ptgs2, while the authors report none.

      Reply:

      • According to the literature (e. g. refs. 11, 27), most IFN-induced accessibility changes occur in the vicinity of the TSS of ISG. This is further strengthened by the data shown in this manuscript. In addition, most functionally validated GAS and ISRE sequences are in the DNA interval chosen for our analysis. While distal ISG enhancers have been reported (e. g. DOI: 10.26508/lsa.202201823), an analysis beyond the placement of most control regions increases the risk of wrong assignments between ISG and their regulatory elements, hence the causality between transcription factor binding and accessibility changes.
      • We extended the regions for the analysis of the Lrp11 and Ptgs2 regulatory regions and found no evidence for the binding of ISGF3 or IRF1. We find no evidence for a clear peak in the Ptgs2 promoter. There is a peak called by the Macs2 algorithm, but visual inspection of the track (bigwig file) shows it consists of a minor increase in reads above background that does not suggest a bona fide IRF1 binding site (see below). This view is supported by our inability to find an IRF binding site in the vicinity of the peak.

      IRF1 binding indicated by bigWig browser tracks and corresponding peakfiles detected at the locus. We identified the peakfile from Langlais et al., 2016 and identified peaks using MACS2, however using mm10 genome as the analysis in the original paper was done with mm9 genome. The peak identified here appears to be an artefact of the MACS2 program as there is no evident enrichment at the gene promoter region upon inspection of the bigWig files.

      Revision plan: Scales will be added to the browser tracks as requested.

      Referee #3, major comment 7:

      Lack of statistical analysis on chromatin accessibility claims: The authors claim that ATAC-seq data in BMDMs stimulated with IFNβ or IFNγ for a short (1.5 hours) or long (48 hours) period reveals a striking similarity between transcription and the general trends of chromatin accessibility at regions up to 1000 bp upstream of the TSS (Fig. 2a), suggesting continuous chromatin remodeling during the transcriptional response. However, I would like to know if this conclusion is well-supported by the correlation between the chromatin accessibility from ATAC-seq data from only one sample and the PRO-seq data.

      Reply: See revision plan.

      Revision plan: We will analyze single experiments whether they support the conclusions derived from the z-score of the triplicate samples.

      Referee #3, major comment 8:

      The need for additional experiments to verify claims such as the dependence of Ifi44 on IRF1 for gaining ATAC signal, as stated in the claim, "Expression required IRF1 for both, but accessibility of the Ifi44 regulatory region depended upon IRF1 whereas that of Gbp2 acquired an open structure independently of IRF1 (Fig. 5c).

      Reply: We think the lack of clarity might be related to the size of figures 5a and 5b and the density of the dots in some areas of the plot. We agree it is very difficult to assign our gene labels unambiguously to a single dot.

      Fig. 5a combines ATACseq data in wt and IRF1 knockout cells with the expression data from the Pro-seq experiment, Fig. 5b is the same set-up, but IRF9-deficient macrophages are analyzed.

      Blue dots show ATACseq signals induced by IFN treatment. Violet dots represent genes that require IRF1 (Fig. 5a) or IRF9 (Fig. 5b) for transcriptional induction. Yellow dots mark genes such as IFI44 requiring IRF1 (Fig. 5a) or IRF9 (Fig. 5b) for both expression and the accessibility change in the promoter region. Fig. 5c visualizes representative examples of genes whose accessibility is coupled to the transcription factor dependence of the transcriptional induction (IFI44), or not (Gbp2). Thus Fig. 5c must be interpreted based on the dot color code in fig. 5a and we admit this has been difficult with the figure in its present form.

      Revision plan: We will improve the clarity of figs 5a and 5b in several ways:

      • We will label the panels to better indicate the intersected data sets.
      • We will increase the size of the panels and figure legends and make sure that the correspondence between gene names and dots are unambiguous.
      • We will include trend lines of the Ifi44 and Gbp2 genes to visualize their induction and IRF1 dependence.

      Referee #3, major comment 13 (see also section 3):

      The authors have not adequately addressed the methodological limitations in their discussion, which extends beyond the aforementioned comments. It is suggested they include a comprehensive discussion of the claims made pertaining to the necessity of IRF1 for accessibility and the potential biases in the interactomes, along with their associated consequences.

      Reply: The contribution of IRF1 to the accessibility of ISG promoters emerges from the data in figures 5a, whose clarity will be improved (see reply to point 8). We do not interpret the impact of IRF1 beyond the data, in fact we state a relatively minor effect of IRF1 in the control of promoter accessibility (p. 10, lines 20-22) and we have added a reference in agreement with an impact of IRF1 on basal expression of antiviral genes (ref. 39, as suggested by the referee).

      We have added discussion on potential limitations of the TurboID approach (p. 11, lines 22-24 and p. 15, lines 3-11).

      Revision plan: Improvement of fig 5a (see ref. #3, point 8).

      Referee #3, minor comment 2

      Fig 1e. The color scales on the GO enrichment graphs are misleading since they use the same blue-to-red gradient for adj p-values ranging from 10-25 to 10-49 and 0.008 to 0.016, which could be considered non significant.

      Reply: We agree that this is confusing. It results from automated assignments of the color gradients by the software.

      Revision plan: We will investigate possibilities to change color codes for different ranges of p values.

      Referee #3, minor comment 4

      The incomplete schema in Figure 1a, which only focuses on PRO-seq and does not include the ATAC-seq element.

      Reply: We will add a new figure to visualize the set-up of the ATAC seq experiments and their intersection with the Pro-seq data.

      Revision plan: We will add a new figure in accordance with the referee’s request.

      Referee #3, minor comment 6

      The clearer labeling of Figure 5a and 5b.

      Reply: Please refer to our reply to major point 8.

      Referee #3, minor comment 10

      Fig S1b, S3b. The PRO-seq was generated in triplicates, hence these graphs should include the Log2FC for the individual data points.

      Reply: The Log2FC from DESeq2 were calculated from the triplicates, the software does not compute Log2FC from individual replicates.

      Revision plan: We mention the p-values for the Log2FC to show the degree of consistency (figure legends). We will provide a table with log2FC and corresponding padj values of the genes represented at each timepoint (table_showing_padj_values_and_log2fc).

      Referee #3, minor comment 12

      In the genomic snapshot shown, only bars or fading triangles are shown in place of the gene body. The authors should provide an accurate gene structure; i.e., exons and introns.

      Reply: We will try to include the exon-intron structure wherever the size of the figure allows this.

      Revision: n. a.

      Revision plan: If figure size permits, we will add the exon-intron structure of the genes in browser tracks as requested.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Referee #1, major comment 1

      Figure 2. Difficult to interpret data as it is presented. Consider quantifying figure 2C in order to make "changes in Pol II pausing were more pronounced during IFNb signaling" statement more apparent.

      Reply: We presented the pausing data in two different graphic representations (figures 2c and S2) to make the understanding of the information content easier. In hindsight we may have generated more confusion than clarity.

      Revision: We removed the original figure 2c and replaced it with original figure S2. This representation is quite intuitive as the graphs represent a direct quantitative logarithmic display whether and how much the relative amount of paused polymerase changes when comparing IFN-treated and untreated cells. The calculation of these ratios is now explained better in the legend to figure 2.

      Referee #1, major comment 2

      How are you distinguishing autocrine signaling in the BMDMs driven by IFN treatment from late transcripts (for example, at 48 hours are differential genes due to autocrine cytokine signaling or are they truly late transcripts)?

      Reply: We do not exclude autocrine effects. In case of ISG, the most likely autocrine factor would be secreted interferon. According to our Proseq data, the differentially expressed genes do not include any interferon genes. That being said, it is possible that the transcription factors from the AP1 family we hypothesize as drivers of secondary or tertiary waves of transcription are activated by non-IFN cytokines secreted from IFN-treated cells (see also reply to comment 3).

      Revision: We now mention that enhanced IFN production is not sustaining ISG responses (p.5 lines 18/20). We mention the possibility that secreted factors may drive secondary or tertiary waves of ISG transcription (p. 8, lines 21/23).

      Referee #1, major comment 3

      Figure 3D. Authors choose Gbp2 (as positive control for IFNg driven gene), but don't show that Gbp2 is a IFNb independent gene. Consider using IRF1 KO BMDMs in this data as well.

      Reply: This is a misunderstanding. Gbp2 is not shown as an IFNγ-specific gene (it’s induction by both IFN types has been shown previously and emerges from our Pro-seq analysis, see also response to minor issue no. 2). It represents the cluster of genes that are sustained specifically after IFNγ treatment in an IRF1-dependent manner. The purpose of fig. 3D is to show that not all ISGF3/IRF9-dependent genes have promoter binding sites for ISGF3 and not all IRF1-dependent genes have binding sites for IRF1. This suggests indirect effects of both transcription factors in sustaining IFN-induced transcription (in line with the referee’s comment 1).

      Previous figure S3e (now S2f) confirms binding of IRF1 to the GBP2 promoter by ChIP with kinetics correlating to its transcriptional effect. This experiment is normalized with an IgG control. IRF1 knockout cells did not produce a ChIP signal with IRF1 antibody, as expected (data not shown).

      Revision: We better explain the rationale behind the experiments shown in figure 3D (text on p8, lines 12-16). In addition, we show the trend line of Gbp2 expression in WT vs IRF1KO as well as that of additional genes showing delayed/sustained responses in the new Figure S3.

      Referee #1, minor comment 2

      Define known IFNg and IFNb driven genes when they are introduced in figure 2 rather than in discussion.

      Reply: Following the referee’s suggestion we provide the examples of IFNβ and IFNγ-controlled genes and the characteristics of their regulation in the context of our description of the results displayed by fig. 2 (p.6 lines 15-21). This includes Gbp2 (see major issue no. 3).

      Revision: The text on p. 6 lines 15-21 has been modified in accordance with the request.

      Referee #1, minor comment 4

      Unclear whether IRF1 expression in figure 3A is from whole cell lysate or nuclear fraction.

      Reply: We indicate in the figure legend that whole cell lysates were used.

      Revision: We added a sentence with the relevant information in the legend of figure 3.

      Referee #1, minor comment 5

      Authors suggest IFNb treatment induces less IRF1 at later time points, however loading control also seems slightly lower than other considerations. Is it possible that IFNb treated cells are dying at later time points, given that type I IFN signaling can be pro-apoptotic.

      Reply: The graph below the blot represents quantified IRF1 signals, normalized to the loading control. It shows that the differences are not generated by unequal loading of the blotted gel. We and others have shown that IFNβ may indeed enhance macrophage death, however only when the cells are simultaneously infected with an intracellular pathogen (e.g. new ref. 25). These studies also show that treatment with IFNβ alone over periods used in the present study does not affect macrophage viability.<br /> Revision: We added a sentence about the viability of IFN-treated macrophages (p. 4, lines 31-32).

      Revision plan: n. a.

      Referee #2, major comment 3

      The sequencing and BioID data are not submitted to public databases.

      Reply: An accession number has been added.

      Revision: The accession number was added on p.29, line 25.

      Referee #3, major comment 1 (see also revision plan, section 2):

      Revision: The rationale for using the top 1.000 genes is explained (p.5, lines 7-9). The description of the pro-seq read count processing has been extended in accordance with our reply to the referee in the legend of figure 1d and in the methods section (p. 33, lines following line 10.)

      Referee #3, major comment 2

      Fig 2c. The authors claim that RNA Pol II pausing is a major factor in controlling the dynamics of ISG transcription. However, they did not provide sufficient explanation of the results, and in all fairness there is not much variation between the clusters to sustain the claim that this is a major factor in ISG transcriptional control.

      Reply: We agree with the referee that we cannot posit RNA pol II pausing as a major factor for the differences of transcriptional control of ISG in individual clusters. We have made sure to remove any statements suggesting this possibility. We also try to better integrate our findings with RNA pol II pausing into the existing literature.

      Revision: We added relevant literature on p. 6 lines 28-30 and p. 7, lines 4-6.

      Referee #3, major comment 4

      On p.5, the authors mention "Representative browser tracks from the Gbp2 and Slfn1 genes further validate this observation" but they are simply referring to genome browser snapshot, i.e., specific genomic examples, extracting from the same single dataset. Without using an independent dataset, this can not "further validate" the initial findings.

      Reply: We agree the wording is incorrect.

      Revision: We changed the paragraph describing this experiment (p. 6, lines 15-21).

      Referee #3, major comment 5

      IRF1 was successfully pulled down with STAT1 bait but not in the reciprocal experiment. The author should discuss this point as it is important for the conclusions. Could it potentially indicate issues with the technique used, and if this could introduce any bias into the results. The statement, "In contrast, interactors of the IRF1 bait did not include STAT1. This discrepancy could result from steric constraints of the tagged proteins due to the limitation of the 10nm distance reached by the biotin ligase," does not seem to be sufficient to explain this discrepancy.

      Reply: STAT1 was present in the IRF1 pull-down and the interaction increased significantly after IFN treatment but after normalization to the NLS control it did not conform to our criterium of a 95% confidence interval for the FDR. To be consistent we did not include it in the list of IRF1 interactors. We have observed on several occasions that the significance of proximity is not reciprocal, even for well- documented physical interactions. A prime example for this is the interaction between STAT1 and IRF9 in IFN-treated cells which is recorded in the STAT1 pull-down, but not that with IRF9 (ref. 10). Apart from steric reasons the lack of reciprocity may result from different signal/noise ratios in pull downs with different baits.

      Revision: We mention that IRF1 was a STAT1 interactor below the statistical cut-off (p. 11, lines 26-28) as well as the possibility of different signal/noise ratios in the IRF1 and STAT1 pull-downs on p.11, lines 22-24.

      Referee #3, major comment 9

      In the figure legends, there is missing information about the number of times experiments were replicated, suggesting that some were done a single time. Moreover, some graphs are missing statistical analysis, e.g., in Fig S3cS3e, S3f, the ChIP-qPCR experiments were done on biological triplicates, there is no mention of statistical test performed, it is not mentioned what the error bars represents (SD, SEM, etc.) and the variance is large, but the authors still interpret these results as significant enrichment of the transcription factors to the Mx2 promoter.

      Reply: Where missing the relevant information has been added to figure legends. In brief, all experiments represent at least three biological replicates. The only exception is the western blot shown in figure S3a, (no S2a) which represents two independent replicates. Here, the clarity of the difference of IRF1 expression and the fact that the only purpose is to show that Raw264.7 macrophages behave like bone marrow-derived macrophages in fig. 3a justifies the omission of another replicate (please see also answer to point 3).

      Revision: The relevant information has been added to figure legends where necessary (figs. 1, a, 3a, 6a-f, S1, S4, S5).

      Referee #3, major comment 10

      Another example are the RNA Pol II pausing index ratios, which show minor variations and not are supported by statistics to support a possible significance. Proper description, replication and statistical analyses of the results are critical.

      Reply: We agree.

      Revision: Statistics underlying the RNA Pol II pausing data are included in supplementary data 2.

      Referee #3, major comment 11

      The authors used CRISPR-Cas9 genome editing to generate knockout cell lines. However, they did not verify the knockouts at the protein level. Further experiments could confirm that the targeted proteins are not expressed in the knockout cell lines.

      Reply: We included a western blot showing the lack of IRF1 and STAT1 expression in the respective cell lines.

      Revision: New figure S6.

      Referee #3, major comment 12

      On p.9, it is mentioned "IRF1 affects chromatin structure ...". Here chromatin structure is related to minor changes in chromatin accessibility, this can not be qualified as changes in chromatin structure.

      Reply: ‘structure’ has been changed in accordance with the request.

      Revision: ‚structure‘ has been replaced with ‘accessibility’. (p. 10, lines 19 and 21).

      Referee #3, major comment 13 (see also section 2, revision plan, major comment 8)

      The authors have not adequately addressed the methodological limitations in their discussion, which extends beyond the aforementioned comments. It is suggested they include a comprehensive discussion of the claims made pertaining to the necessity of IRF1 for accessibility and the potential biases in the interactomes, along with their associated consequences.

      Reply: The contribution of IRF1 to the accessibility of ISG promoters emerges from the data in figures 5a, whose clarity will be improved (see reply to point 8). We do not interpret the impact of IRF1 beyond the data, in fact we state a relatively minor effect of IRF1 in the control of promoter accessibility (p. 10, lines 20-22) and we have added a reference in agreement with an impact of IRF1 on basal expression of antiviral genes (ref. 39, as suggested by the referee).

      We have added discussion on potential limitations of the TurboID approach (p. 11, lines 22-24 and p. 15, lines 3-11).

      Revision: Change of the discussion section (p. 11, lines 22-24 and p. 15, lines 3-11).

      Revision plan: Improvement of fig 5a (see ref. #3, point 8).

      Referee #3, major comment 15

      The work should be discussed in the context of the demonstrated physiopathological evidence of the IRF1 and IRF9 functions. IRF9 (Hernandez et al., JEM 2018) and more recently IRF1 (Rosain et al Cell, 2023) were identified as causing non overlapping phenotypes in human patients carrying loss-of-function mutations for these genes. The authors must interpret their results in this context.

      Reply: We thank the referee for reminding us about the importance of these papers for our work.

      Revision: The papers have been mentioned and discussed (p. 13 lines 19-28 and p.14, lines 9-14).

      Referee #3, minor comment 3

      The inconsistency in the title referring to IFNb as Type 1 but using IFNg instead of Type 2 nomenclature, perhaps consistency is best.

      Reply: We agree about the importance of consistency but find ourselves in yet another quandary. While the use of ‘type I IFN’ is clearly indicated and widely used as a collective name for this group of cytokines, the use of ‘type II IFN’ for IFNγ is rare because it is the only member of this type. Hence, we decided for sticking with convention at the expense of a bit of consistency. We agree about the title, though, and have changed type I IFN to IFNβ.

      Revision: We adapted the title in agreement with the referee’s comment.

      Referee #3, minor comment 5

      Figure 6d includes a color scale of -1 to +3, but it is unclear what these values represent and how they were calculated per interactor. The figure legend should be revised to clarify this information.

      Reply: We agree. The relevant information has been added to the figure legend.

      Revision: We added information (log2FC with regard to the NLS control) to the legend of fig. 6d.

      Referee #3, minor comment 9

      Fig 1e, S1c. Graphs having circles of varying sizes in function of a value are named "bubble plots" and not "dot plots".

      Reply: Thank you for pointing this out, we corrected our mistake.

      Revision: We changed dot plot to bubble plot in legend to figure S1c.

      Referee #3, minor comment 11

      Fig S3c legend. It is mentioned "Graph represents RT-qPCR of genomic Mx2". RT-qPCR usually stands for reverse transcription quantitative PCR, hence we suggest to change to "ChIP-qPCR" or qPCR. Confusingly, in the literature the term "RT-PCR" is used for real-time PCR and "qPCR" for quantitative PCR. Also, the authors should be specific about the "genomic" region targeted; the graphs mention "promoter", hence it would be appropriate to use the same designation in the legend.

      Reply: We agree and thank the referee for correction of the terminology.

      Revision: We changed RT-PCR to qPCR throughout the manuscript. Moreover, we specifically refer to ‘promoter region’ as the amplified DNA.

      Referee #3, minor comment 12

      Fig S3e. The y-axis names are missing.

      Reply: Thanks for spotting this.

      Revision: The y axis in the figure received its proper label.

      Referee #3, minor comment 14

      Raw cells are sometimes spelled as "Raw" and other times as "RAW". Please choose one for consistency.

      Revision: This inconsistency has been corrected

      Referee #3, minor comment 15

      In p.10 l.20, the figure number is missing.

      Revision: We corrected this mistake.

      4. Description of analyses that authors prefer not to carry out

      Referee #1, minor comment 1

      Simplify figure 4B- consider focusing on most differentially expressed genes between clusters

      Reply: The purpose of fig. 4B is to provide a visual overview of the kinetics of eRNA transcription in response to both IFN types and of the effects of IRF9 and IRF1 knockouts. This information needs to be given to demonstrate the similarities and differences between the control of eRNA and the corresponding ISG transcripts in the different regulatory clusters (as shown in figs. 1d and 2a).

      Simplifying the figure would mean to separate it according to time point, IFN type treatment or knock-out effect. We think this would require to mentally reassemble the figure to understand the interrelationships between these parameters. To our opinion the visual display of the data interrelationship in fig. 4B facilitates the impropriation of the information content.

      Revision: n. a. - we hope our reasoning has become sufficiently clear.

      Revision plan: n. a.

      Referee #1, minor comment 3

      Clarify which cell types (IRF1 KO vs IRF9 KO) are used in figure 5 A/B.

      Reply: The cell type (bone marrow-derived macrophages) is mentioned in the first sentence of the figure legend. Since all experiments except the Bio-ID experiment were performed with this cell type we decided not to label each figure.

      Revision: n. a.

      Revision plan: n. a.

      Referee #2, major comment 2 and referee #3, major comment 14

      Ref #2: Biological significance is limited as this study is largely descriptive and they do not test the hits obtained from BioID.

      Ref #3: Although the TurboID experiments identify known STAT1 and IRF1 interactors, the proposed new interactors are numerous, and none are validated through independent co-IP experiments. Moreover, the results are very noisy, with little differences between untreated BMDMs (where IRF1 is barely expressed) and IFN-treated conditions.

      Reply: The big advantage of BioID or TurboID is the ability to score proximity and very transient interactions. Validating BioID hits with technologies such as coIP is not particularly useful as the two technologies will obviously produce different interactomes. In fact, we show in this manuscript that IRF1 and STAT1 show proximity, but they do not form a stable complex under co-IP conditions. This leaves genetic approaches (LOF or GOF) as alternatives. However, apart from the workload (> 100 genes would have to be knocked out or their products overexpressed), most of our hits are expected to produce very broad effects in such experiments, hard to interpret regarding ISGF3 and IRF1 activities.

      In view of this situation, we publish exclusively the high confidence nuclear interactors identified in our screen: biological replicates were performed in triplicate, a stringent internal control (TurboID-NLS) was used, and a stringent statistical cut-off for high-confidence interactors (95% FDR between groups) was applied. We further account for the experimental situation by limiting interpretation of the data to confirmed molecular events. For example, STAT1 dimers and the ISGF3 complex are required for histone acetylation in response to IFN, and ISGF3 is known to contribute to the exchange of the H2AZ histone variant (refs 11, 14, 71, 72). Our data show that IRF1 contributes to promoter accessibility changes and this is in line with its proximity to a remodelling complex. Thus, the BioID data indeed validate previous findings. However, in agreement with the referee’s comment, some of the data remain descriptive (such as the intriguing proximity of both STAT1 and IRF1 to nuclear products of ISG). To determine the importance of this molecular proximity is a major undertaking and beyond the scope of this study.

      Revision: We added discussion to state the difficulty of validating TurboID-based interactions and the limitations of the TurboID experiments (p.15 lines 3-11).

      Referee #3, minor comment 1

      In most graphs the expression values or log2FC are shown separately for IFNb and IFNg, however in the heatmaps (Fig 1d, S1d) the IFNb and IFNg results are intercalated keeping them side-by-side for each time point, which makes them more difficult to interpret.

      Reply: We are in a quandary about the design of the figure. On the one hand our goal is to visualize gene clusters with distinct behaviors for each IFN type. For this purpose, it would be advantageous to separate the IFN types. On the other hand, we aim at showing similarities and differences between genes induced by each IFN type, for this purpose it is better to maintain the current sample order. While understanding the referee’s point, we prefer to keep the figure as it is, because the suggested change will not increase its overall clarity.

      Revision: n. a.

      Revision plan: n. a.

      Referee #3, minor comment 7

      The statement that "IFN-I are the more important mediators of antiviral immunity" is not entirely accurate and may be an oversimplification, as there are certainly articles which suggest a larger role for type ll IFN elements than type l (ref: Yamane D et al., 2019 Nature microbiology). While yes, IFN-I plays a critical role in the innate immune response to viral infections, IFNγ also has antiviral activity and is involved in the adaptive immune response to viral infections, and in some instances to a larger extent than IFN l.

      Reply: The Yamane et al study (now mentioned on p 10, lines 22-25 and referenced) agrees with our findings because it shows that IRF1 contributes to the basal expression of an ISRE-driven ISG subset. Our statement about the predominant role of type I IFN versus IFNγ refers to genetic data in both humans (mainly Casanova’s work including effects of autoantibodies against type I IFN, see also the paper about human STAT2 deficiency in the June 15th issue of the JCI, https://doi.org/10.1172/JCI168321) and mice (hundreds of papers) showing that disruption of type I IFN synthesis or response causes profound effects of antiviral immunity (i.e. resulting susceptibilities are first and foremost to viral pathogens) whereas susceptibilities as a consequence of disrupting the IFNγ pathway are first and foremost to intracellular nonviral pathogens such a mycobacteria. In fact, the term mendelian susceptibility to mycobacterial disease (MSMD) was coined by Casanova and colleagues to describe a variety of human mutations that include those of the IFNγ, but not the type I IFN pathway.

      Maybe more importantly, the Rosain et al. paper mentioned by the referee which appeared in ‘Cell’ while our study was under review, shows that human IRF1 mutations also fall into the MSMD category (new ref. 65). In contrast, the authors did not observe diminished antiviral immunity. This emphasizes the main conclusions of our study about the relevance of IRF1 for macrophage activation. We discuss this paper on p 14. lines 9-14.

      Obviously, this does not exclude a role of type I IFN in nonviral infection or of IFNγ in viral infection, in fact much of our own work has been dedicated to a role of type I IFN in infections with L. monocytogenes. Nevertheless, we think that in a generic statement about the difference between type I IFN and IFNγ it is correct to label the former as predominantly antiviral and the latter predominantly as a macrophage activating factor against nonviral, intracellular pathogens.

      Revision: We added discussion of Rosain et al. (ref. 65) on p 14. lines 9-14.

      Referee #3, minor comment 8

      The authors claim that a significant portion of ISG promoters is associated with ISGF3 upon IFNγ receptor engagement and that the transcriptomes of macrophages treated briefly with IFNβ or IFNγ exhibit remarkable similarity and sensitivity to Irf9 deletion. However, I am uncertain about the extent of consensus on this claim.

      Reply: The data were surprising but supported by ChIP-seq and RNA-seq in wt and IRF9 ko macrophages (ref 10). Data in a follow-up study (ref. 11) and in this manuscript support our original conclusion by demonstrating the impact of the IRF9 ko on IFNγ responses. Importantly, we don’t claim this is true in all cell types, it may well depend on STAT/IRF9 expression levels and tonic IFN signaling.

      Revision: n. a.

      Revision plan: n. a.

    1. Background Polygenic risk score (PRS) analyses are now routinely applied in biomedical research, with great hope that they will aid in our understanding of disease aetiology and contribute to personalized medicine. The continued growth of multi-cohort genome-wide association studies (GWASs) and large-scale biobank projects has provided researchers with a wealth of GWAS summary statistics and individual-level data suitable for performing PRS analyses. However, as the size of these studies increase, the risk of inter-cohort sample overlap and close relatedness increases. Ideally sample overlap would be identified and removed directly, but this is typically not possible due to privacy laws or consent agreements. This sample overlap, whether known or not, is a major problem in PRS analyses because it can lead to inflation of type 1 error and, thus, erroneous conclusions in published work.Results Here, for the first time, we report the scale of the sample overlap problem for PRS analyses by generating known sample overlap across sub-samples of the UK Biobank data, which we then use to produce GWAS and target data to mimic the effects of inter-cohort sample overlap. We demonstrate that inter-cohort overlap results in a significant and often substantial inflation in the observed PRS-trait association, coefficient of determination (R2) and false-positive rate. This inflation can be high even when the absolute number of overlapping individuals is small if this makes up a notable fraction of the target sample. We develop and introduce EraSOR (Erase Sample Overlap and Relatedness), a software for adjusting inflation in PRS prediction and association statistics in the presence of sample overlap or close relatedness between the GWAS and target samples. A key component of the EraSOR approach is inference of the degree of sample overlap from the intercept of a bivariate LD score regression applied to the GWAS and target data, making it powered in settings where both have sample sizes over 1,000 individuals. Through extensive benchmarking using UK Biobank and HapGen2 simulated genotype-phenotype data, we demonstrate that PRSs calculated using EraSOR-adjusted GWAS summary statistics are robust to inter-cohort overlap in a wide range of realistic scenarios and are even robust to high levels of residual genetic and environmental stratification.Conclusion The results of all PRS analyses for which sample overlap cannot be definitively ruled out should be considered with caution given high type 1 error observed in the presence of even low overlap between base and target cohorts. Given the strong performance of EraSOR in eliminating inflation caused by sample overlap in PRS studies with large (>5k) target samples, we recommend that EraSOR be used in all future such PRS studies to mitigate the potential effects of inter-cohort overlap and close relatedness.

      This work has been peer reviewed in GigaScience (see Description), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      ** Jack Pattee**

      Overall, I think that this manuscript is strong and describes a well-formulated method to address a relevant problem. There are a few outstanding questions about the performance of the EraSOR method from my perspective, which I'll detail as follows.My understanding of reference [16] indicates that equation (3) of this manuscript only holds for null SNPs, i.e. if SNP g is not associated with the outcome Y. If this is the case, then this should be discussed in the manuscript. I wonder if this can partially explain the 'under-estimation' behavior we see in the application to real data in Supplementary Figure 3. In particular, I am referencing the behavior where the EraSOR correction will under-estimate the predictive accuracy of the PRS in the target data, i.e. where delta-R^2 is negative. This behavior is not seen in the simulation and warrants further investigation and discussion. While the bias appears small, for some cases delta-R^2 approaches -.025, which corresponds to an under-estimation of Pearson's r by roughly .15; this is substantial. Could it be the case that, for highly polygenic traits such as height and BMI, the null-SNP assumption is unreliable and the performance of EraSOR is degraded? Does a fundamental assumption of sparse genetic association underlie EraSOR?I recommend that the real data application play a larger role in the manuscript narrative and be moved out of the supplementary. The simulations are appreciated and helpful, but there is nuance in the analysis of real data that cannot be replicated in simulation.I believe the reference to "Supplementary Figure 2" on line 346 should actually be "Supplementary Figure 3". I believe that the axis labels in Supp Figure 3 are flipped.Lines 82 and 83 reference genetic stratification and subpopulations; I think the relevance of these concepts should be introduced more clearly and they should be defined in this context. EraSOR concerns the overestimation of predictive accuracy and association incurred by sample overlap between the base and target GWASs; to this reader, it's not clear what this central issue has to do with population stratification. I realize that the derivation of the LD score method is motivated heavily by correcting for stratification; however, these concepts should be introduced more clearly in this manuscript.Line 88: consider defining LD score l_j.Lines 94-96: consider outlining the mathematical consequence of the assumption that "the two outcomes and cohorts are identical." It's the case that N_1 = N_2 = N_c = N, correct?Line 109 / equation (11): My understanding is that the relevant quantity of this derivation is N_c / sqrt(N_1 N_2), which allows us to define the correct matrix C in expression (4). If this is the case, perhaps the quantity of interest should be moved to the LHS of the equation in the final line of the expression, for clarity.As discussed in the manuscript, the estimated heritability is in the denominator of the expression for N_c / sqrt(N_1 N_2). The authors correctly discuss that the method should not be applied when there is doubt as to whether the heritability is different from zero. I would take this a step further; in cases where the heritability is zero, we cannot meaningfully apply the EraSOR correction, and thus I am not sure of the utility of the 'type I error' simulations in the manuscript. Perhaps an explicit test for h^2 > 0 should be worked into the EraSOR workflow?Line 148 / expression (12): If beta has a normal distribution here, it is the case that all SNPs in the simulation are associated with the outcome Y. This is a somewhat unusual choice for the distribution of SNP effects in a simulation; other applications such as LDPred (Vilhjalmsson et al, AJHG 2015) and LassoSum (TSH Mak et al, Genetic Epi 2017) use a point-normal distribution for simulated SNP effects, which effectively simulates the sparsity frequently observed in nature. Is there a reference or justification for the non-sparse simulation structure here?Line 215: there may be a typo in the expression for the variance of the residual term. Is it the case that the variance of the residual depends on the variance of a covariance term? If so, I am confused as to the derivation.Line 241: 'triat' should be 'trait'.The simulation results in this paper are based on clumping and thresholding for PRS, which does not estimate joint SNP effects i.e. account for LD. Methods such as LDPred and LassoSum do so. Is there any reason to believe the results would be different for a method such as LassoSum?I am confused by the very low Fst between the simulated Finnish and Yoruban samples in simulation. As detailed on line 385: the reported Fst is > .1, but the simulated Fst is essentially zero. This seems likely to be an undesirable simulation artefact, and potentially invalidates the simulation study (or, at least, doesn't provide evidence that EraSOR functions correctly when Fst is large, which was the ostensible motivation for this simulation). Is there no way to effectively simulate populations with a larger Fst?

    1. And I bid you all do likewise. In an ordinary crime, how does one defend the accused? One calls up witnesses to prove his innocence. But witchcraft is ipso facto, on its face and by its nature, an invisible crime, is it not? Therefore, who may possibly be witness to it? The witch and the victim. None other. Now we cannot hope the witch will accuse herself; granted? Therefore, we must rely upon her victims – and they do testify, the children certainly do testify. As for the witches, none will deny that we are most eager for all their confessions. Therefore, what is left for a lawyer to bring out? I think I have made my point. Have I not?

      The logic Danforth uses to justify and explain to Hale why a lawyer is not necessary in this instance is a flawed logic.

      He states that witchcraft is "an invisible crime" in which only the witch and victim are present, also that as the witch herself will hardly accuse herself, the court must rely upon the victim to testify buy identifying the witch in question.

      BUT what he fails to take into account here is that he is assuming that the "victims" are actually victims in the first place and that their accusations are true. He has no real evidence of this other than the girls' confessions. Danforth thus makes a big mistake in assuming that their accusations are valid and to be believed.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1* (Evidence, reproducibility and clarity (Required)): *

      * Srinivasan et al. present a comprehensive study on systematizing the structure-dynamics-function relation of lipid transfer proteins (LTPs), combining extensive molecular simulations and complementary experiments. Indeed, the current state-of-the-art in the field is quite chaotic and fractional, and such systematic studies are necessary to advance our general and conceptual understanding of the mechanisms of action of LTPs. The selected techniques and research strategies are all suitable, their description is sufficient and enables reproducibility; the obtained results are carefully presented and discussed; the conclusions are adequately supported by the data.

      Given my primarily computational background, I evaluated mainly the simulation part of the manuscript. Considering experiments, I do not see any significant flows or deficiencies that could diminish the value of the data and following conclusions given in the manuscript. I would even suggest improving the abstract by more explicitly saying that this work includes experimental measurements because it currently reads like purely computational work was performed. *

      We thank Reviewer #1 for the positive evaluation of our work. The abstract has now been updated to include that our work allows us to interpret existing data but also to design and perform new experimental measurements.

      * Major comments: *

      1) Although I like the central message of the paper and have no objections, I am curious whether the conclusion "a more "dynamic" or/and "mobile" part of the protein interacts with the membrane or any other (macro)(bio)molecule" makes sense globally and is not limited to LTPs. For example, it is a reasonable assumption that a more flexible part of the protein, i.e., capable of adopting necessary binding configurations, would be a more likely interacting spot. Locking in a less flexible and more specific configuration upon binding with a target molecule is also anticipated and quite typical, e.g., when ligands interact with target proteins, thereby blocking their function. The authors themselves recognize this paradigm as referring to the enzymes' dynamics. It would be great if authors could comment more on dynamics-function relation, referring to the existing literature, where such observations were/were not observed for different protein families. Performing simulations on proteins that do not exhibit such a feature and do not belong to LTPs, but, e.g., structurally similar to some of the studied LTPs, would be an excellent addition too, highlighting this signature characteristic of LTPs.

      We have now added a discussion comparing the mechanism we observe with those described for other proteins such as membrane transporters and receptors. Since those proteins are very different and have been already thoroughly characterized (including with molecular simulations) we don’t think that additional simulations are required. Also, concerning protein binding dynamics, we refer to the excellent review of Wade and coworkers: "Acc. Chem. Res. 2016, 49, 5, 809–815"

      "____Notably, the conformational plasticity we observe for LTPs is reminiscent of other, previously described, functional protein mechanisms, including enzyme dynamics during catalysis (____DOI: 10.1126/science.1066176____), the alternating-access model of membrane transporters (____https://doi.org/10.1038/nsmb.3179____) or GPCR dynamics (____https://doi.org/10.1021/acs.chemrev.6b00177____). In all these cases, protein dynamics is strongly coupled to ligand binding (____https://doi.org/10.1021/acs.accounts.5b00516____) and protein function, be it for signaling, transport or enzymatic activity. Unlike for these fields, however, the contribution of structural and spectroscopic studies to uncover LTP dynamics remains quite limited, and our simulations provide an important contribution to fill this gap. We hope that our results will motivate researchers to increase efforts to experimentally quantify LTPs conformational plasticity, e.g. by structural determination of LTPs in different states (or bound to different lipids) or by single-molecule spectroscopy studies."

      *Minor comments: *

      *

      1) Fig 1d. What is so special in Lysine compared to Arginine? Is there any disbalance in their presence in studied proteins? Any correlations between the binding affinity of certain amino acids and their overall presence on the protein surface? *

      Indeed, there is disbalance in the presence of lysine and arginine residues in our proteins. The relation between the number of these residues in our dataset is Lys:Arg = 1.6:1. On top of that, and as described in (Tubiana T et al PLoS Comput Biol. 2022 ;18(12):e1010346) lysine is preferred over arginine in peripheral membrane proteins, likely because it induces fewer perturbations in the lipid bilayer. Our data also agree with Tubiana et al, concerning the correlation between abundance of specific residues on the protein surface and membrane binding.

      * 2) Fig S1. GM2A and TTPA seem to be irreversibly adsorbed to the membrane on the microsecond timescale in most replicas. Is anything special in these proteins? Did this affect the sampling of a claimed membrane-binding interface?*

      Our interpretation of the different adsorption profile of GM2A and TTPA is that these two proteins appear to have higher membrane affinity in our computational assay in comparison with the other proteins in our dataset. However, this has no effect on the membrane-binding interface as the proteins are still able to undergo significant tumbling before binding to the lipid bilayer, as demonstrated by the angle between the two main protein axes and the bilayer normal before membrane binding (Fig. S8 in Supplementary Information).

      * 3) A related follow-up question. Multiple replicas were performed to identify the membrane-binding interface. However, if I understand well, the initial orientation of the protein with respect to the membrane was always the same. I found it a pity since performing multiple replicas starting from different initial geometries (e.g., rotating the protein in a somewhat systematic way) would likely result in a more efficient exploration of the conformation space. Can the authors comment on whether this predefined initial configuration could negatively affect the results? Performing a few additional simulations for the most problematic proteins I mentioned earlier (GM2A and TTPA) could be a nice opportunity to apply this strategy. *

      In our protocol, all proteins start from the same initial orientation but undergo significant tumbling in solution before interacting with the lipid bilayer, including for the two most extreme cases, GM2A and TTPA (Fig. R1). Hence, we think that there is no bias for what pertains to the final membrane interacting region. We have added the Fig. R1 in Supplementary Information (Fig. S8) and added the following text in the Methods Section:

      "____Despite starting from a single orientation, all proteins undergo extensive tumbling before binding to the bilayer, as illustrated by the angle between the two principal protein axes and the membrane normal for the two proteins that display the highest binding propensity, GM2A and TTPA (Fig. S8)."

      * 4) How was the volume of the cavity affected by mutations in STARD11 and Mdm12? Do these data somehow correlate with the experimentally observed reduced efficiency of the lipid transfer? *

      Our data on the volume of the cavity in STARD11 and Mdm12 are inconclusive. However, we caution from such a simplistic interpretation, since it completely neglects the lipid-bound conformation that normally has a much larger cavity than the apo form (Fig. 3).

      *5) I would appreciate it if the authors considered playing with the templates of the main Figures at later stages because in the current version, and when printed on A4 paper, the readability of certain graphs and pictures is uncomfortable and sometimes even impossible. Obviously, the final schematics would depend on the journal and its formatting. *

      We will modify the templates of the main Figures to improve readability according to journal formatting.

      * **Referees cross-commenting** *

      * I would like to acknowledge the thoughtful and detailed reviews provided by other reviewers. I do like their reports, and I believe that by addressing the reviewers' comments and incorporating their revisions, the article will significantly improve in terms of scientific rigor and contribution to the field. *

      *Reviewer #1 (Significance (Required)):

      This manuscript is a solid scientific work addressing gaps in our knowledge about Lipid Transfer Proteins by employing state-of-the-art methods. It advances the field on conceptual and fundamental levels. This study is of interest to both computational biophysicists and physical chemists (to whom I belong myself) as well as experimentalists, who seek a rational explanation of the experimental observations. *

      We thank the reviewer again for the positive evaluation of the significance of our work.

      Reviewer #2* (Evidence, reproducibility and clarity (Required)): *

      * Summary:

      In a combined computational and experimental study, the authors provide insights into general features of lipid transfer proteins (LTPs), which play key roles in lipid trafficking: Through molecular dynamics simulations of a diverse set of 12 shuttle-like LTPs, they demonstrate that LTPs consistently exist in an equilibrium between two or more conformations, whose populations are modulated by a bound lipid, and that residues significantly involved in these collective conformational changes typically interact with a membrane. Their simulations indicate that conformational plasticity is a general feature of LTPs, leading them to suggest that the ability to change conformations is essential for LTP function. They test the generality of this hypothesis through in cellulo assays of two LTPs (STARD11 and Mdm12) that were not originally simulated. While experiments of STARD11 support their hypothesis, those presented for Mdm12 provide ambiguous results. *

      *

      Major comments: *

      * Throughout the manuscript, it's stated that common 'dynamical features' correlate with LTP function. The accuracy of this statement is unclear since 'dynamical features' are never precisely defined and, while equilibrium conformational ensembles are characterized, dynamics (ie kinetics or time-dependent observables) are not. Please clarify.*

      We plan to improve the scholarly presentation of our article to clarify this issue. In short, two distinct properties modulate protein function: 1. Conformational plasticity, i.e. the (thermodynamic) ability of the protein to adopt different conformations (and with different populations depending on the bound substrate). 2. Conformational “dynamics”, i.e. the propensity to exchange between these different thermodynamic states. This ability depends on the free energy barriers between different states and it is intrinsically a kinetic (rather than thermodynamic) property.

      *More importantly, further evidence is needed to determine a correlation with *function*. LTPs are suggested to have faster transfer rates (a measure of function) if the apo form adopts a substantial population of holo-like conformations, akin to enzyme preorganization. This is further tested by rationally mutating STARD11 and Mdm12. However, the support for this conclusion and if these mutations alter the LTPs conformational ensembles as desired is unclear: *

      In our opinion, the interpretation suggested by Reviewer #2 that there is a “correlation” between transfer rates and the overlap of apo-like and holo-like conformations, though fascinating, cannot be derived from the available data at this stage, and we did not mean to imply as such. Rather, lipid transport is a complex phenomenon that involves several steps (membrane binding/unbinding, lipid uptake/release,…). Our simulations indicate that protein conformational plasticity, including potentially the overlap between apo-like and holo-like conformations, also influences lipid transfer rates. We will clarify this aspect in the text.

      * Is there a quantitative correlation between the overlap of apo and holo conformational distributions (as could be quantified by KL divergence or Wasserstein distance, for example) and difference in transfer rates as suggested by Fig S6?*

      We plan to compute quantitative correlation between apo and holo conformational distribution for Fig.S6 and for mutant simulations (see answer below) but, as discussed above, we are skeptical that we will observe a clear correlation.

      * The conclusion and the generality of the findings would be greatly strengthened if a correlation can be shown for other LTPs through additional simulations of mutants whose transfer rates have been previously characterized experimentally in the literature. (For example: Ryan 2007 PMID 17344474, Grabon 2017 PMID 28718450, Iaea 2015 PMID 26168008, among many others)*

      We are currently running simulations of several mutants to address this point and provide additional data/context.

      * While differences in the apo conformational ensembles of the WT and mutants are observed in Fig S7b and d, if these mutations reduce overlap with holo-like conformations is not determined. Simulations of the WT holo forms are needed to properly test this hypothesis. *

      We are currently performing these simulations.

      • For Mdm12, mutations are specifically made to "lock the protein in the apo-like state;" however, the mutant adopts conformations distinct from the apo form as show in Fig S7d. How do the authors interpret the results of the cellular assays considering this and could it help explain why the mutant has similar kinetics to WT? What may explain the puzzling results of similar transfer kinetics but differing mitochondrial morphology? *

      As discussed above, interpretation of lipid transport rates based exclusively on apo and holo conformational population is premature, as this is a complex mechanism that depends on many variables. For what concerns the experimental results, we think three explanations are possible: 1. Mitochondrial morphology could be more sensitive to small variations in lipid composition than our METALIC assay. 2. Our assay only quantifies transport of unsaturated PC and PE species, and we can’t quantify variations in transport of other lipid species that are likely to also be transported by ERMES, such as PS and PA. 3. According to a recent structural model (Wozny et al, Nature 618, 88–192, 2023), Mdm12 might be part of a tunnel-like LTP complex in which it doesn't establish direct interactions with nearby organellar membranes. As such, its mechanism might be different from the one described here for other shuttle-like lipid transport domains. We will discuss these possibilities in the main text.

      • Confounding factors potentially complicate the interpretation of the in cellulo experiments. Simpler in vitro experiments may be better suited to determine if altering LTP's biophysical properties, namely rationally altering the population of apo- vs holo-like configurations, quantitatively affects transport rates as suggested.*

      We agree with Reviewer #2 that this information could be useful. However, this is beyond our technical abilities, and it would require lengthy and expensive experiments that are unlikely to be completed within a reasonable time framework for a revision (3 months). We have rather opted to better discuss our model in the context of published in vitro lipid transport experiments.

      • The abstract, intro, and title highlight that the manuscript's findings are indicative of and correlated with *function* but on p. 12 it's foreseen "that future studies will focus on the functional consequence of such observation." Please reconcile these conflicting statements and ensure connections to function are accurately described. The current title is rather bold. *

      We will rewrite and clarify the extent of our hypotheses and validations.

      * All mentions of "correlation" throughout the manuscript need to be quantitatively evaluated or properly qualified. In addition to that mentioned above regarding Fig S6, what is the correlation coefficient between residues' contribution to PC1 and membrane interaction frequency (Fig 2)? *

      To address this point, we will quantify the correlation between residues' contribution to PC1 and membrane interaction frequency. However, we expect a low correlation between residues' contribution to PC1 and membrane interaction frequency for at least two main reasons. __ First, not all residues contributing to PC1 interact with membranes, but only a subset, as discussed above. Second, our methodology to compute membrane binding, based on the geometric distance between residues and bilayer, is intrinsically quite noisy (since residues in proximity of bona fide membrane binding regions will also appear as involved in membrane binding), thus making quantification of correlations somewhat inaccurate. Rather, we will try to explain in the text that our observations are not of "correlation" but rather of dependence/association, and we will use quantitative measures to quantify these properties (such as rank correlation coefficients or multivariate analyses).__

      * Residue's contributions to collective conformational changes are found to be indicative of membrane binding. Yet, membrane interacting residues are identified from CG simulations that cannot capture such collective conformational changes due to the use of an elastic network. Given that the CG simulations agree with previous experimental findings, this suggests that collective conformational changes are not important for membrane binding. *

      We disagree with this interpretation by Reviewer #2 of our data: we do not claim that residue's contributions to collective conformational changes is indicative of membrane binding. Rather, membrane binding happens at protein regions displaying high contribution to collective conformational changes. This distinction is subtle but important: protein motion does not determine membrane binding regions. Rather, it appears that, for LTPs, membrane binding regions are also characterized by collective motions (suggesting function). We will clarify this in the main text.

      *Are similar conclusions drawn from residues' RMSFs? In other words, are local conformational fluctuations just as indicative of membrane binding? *

      We will compute protein residues’ RMSFs and compare it with the membrane binding data. However, given that RMSF is representative of thermal fluctuations, we again expect a bad correlation between RMSF and membrane binding. On the other hand, we indeed observe that most membrane binding regions are protein loops, but this is not unexpected (e.g. Tubiana et al, PLoS Comput Biol. 2022 Dec; 18(12): e1010346.). However, such observation does not provide any information on lipid transport, but only on the mechanism of membrane binding. Rather, the observation of a relationship between membrane binding and global motion is more interesting, since the latter is often indicative of protein function.

      *The stated correlation may in fact be spurious and instead arise because residues at the entrance to LTP's hydrophobic cavities need to be positioned at the membrane surface for productive lipid uptake and these same residues must undergo significant conformational changes to allow lipid entry. *

      This is exactly what we think it is happening and what our data suggest. However, one must remember that our simulations allow us to predict the membrane binding interface, that is often difficult to determine experimentally (and often via indirect evidence). Hence our data provide novel evidence in this direction.

      *Is proximity to cavity entrance more or less correlated with membrane binding than 'dynamics'? *

      If we consider that, as discussed before, dynamics does not correlate with membrane binding (there are many dynamical regions that are not at the membrane interface), it is safe to assume that proximity to cavity entrance would correlate more with membrane binding. However, we have to consider that often we do not know where the cavity entrance in LTPs is located simply based on structure alone, and hence our approach provides important clues into this process.

      p.12 speculatively suggests "the high degree of protein dynamics we observed in membrane proximal regions could potentially facilitate the energetically unfavorable reaction that involves the extraction of a lipid from a membrane." Yet, the logic behind this idea does not make sense since a free energy barrier, an equilibrium thermodynamic quantity, cannot be lowered by changes in dynamics. Please explain.*

      Our current understanding of the mechanism of lipid extraction is quite poor. However, both using chemical intuition and following a recent MD study on one LTP (Rogers et al, 2023, Plos Comp Biol), it is safe to assume that the hydrophobic environment around the lipid is important for its stabilization in the lipid bilayer. Hence, reducing the number of hydrophobic contacts between the lipid and its environment could facilitate transport. A highly dynamic protein, by cycling between different conformations, could “stir” the bilayer, and hence decrease the number of contacts between the lipid and its environment favoring transport. We will clarify this point in the text.

      *Examining how the LTPs impact membrane properties would offer insight into the functional relevance of such residues for lipid extraction. *

      Indeed, our point above is connected to this one. We are performing simulations to compute hydrophobic contacts in bilayer as proposed in (Rogers et al, 2023, Plos Comp Biol).

      The authors highlight that a bound lipid alters LTPs' conformational ensembles akin to "conformational selection" or "induced fit." How sensitive are these findings to the bound lipid species? Do LTPs with multiple known substrates exhibit an increasing diversity of holo conformations and are different conformations stabilized by different substrates? Would similar observations (Fig 3) be made with a lipid that is not known to be transferred by a given LTP? An interesting future direction would be to examine if lipid substrate specificity could be assessed by comparing conformational ensembles to that of a known substrate and/or by overlap with the apo ensemble.

      We deem that the role of lipid specificity on LTP conformational plasticity is beyond the scope of the current work. While this topic is certainly worth future investigations, we must point out that (i) not all proteins bind/transport multiple lipids (at least according to current knowledge) and (ii) only few LTPs have been structurally characterized bound to different lipids (Osh4, Osh6, …). This limitation prevents a wide generalization, and we prefer not to speculate on this topic. So far, we have tested our approach for Osh4 bound to cholesterol or PI(4)P and found that indeed the protein exhibits different holo conformations (in agreement with the experimental data) when bound to different substrates. We have added a short comment on this topic in the Discussion section.

      "____We foresee that future studies will focus on the functional consequence of such observation, and most notably to the characterization of the extent to which such conformational changes affect multiple steps of protein function, including membrane binding or lipid extraction and release, and whether these are further modulated when different lipids are being transported."

      For LTPs to transfer lipids between membranes, transitions between apo and holo forms ought to occur when LTPs are membrane bound. How does membrane binding influence the conformational ensembles observed in solution? Does it promote conformational changes between apo- and holo-like structures, as suggested to regulate lipid uptake and release by previous studies of Osh/ORP, Ups/PRELI, and START family members? (For example: Miliara 2019 PMID 30850607, Watanabe 2015 PMID 26235513, Grabon 2017 PMID 28718450, Iaea 2015 PMID 26168008, Kudo 2008 PMID 18184806, Dong 2019 PMID 30783101) While answering these questions would require further computational effort, doing so will allow more accurate assessment of the role of conformational changes in LTP function.

      We can’t unfortunately currently quantify how membrane binding influences the conformational ensembles observed in solution, as the slowdown in diffusion at the water-membrane interface makes this task computationally challenging (and certainly not feasible within the time framework of a review). We have so far tested two different proteins and have not succeeded in converging their conformational distribution when membrane-bound despite long MD simulations that lasted several months (even though the non-converged data indicate sampling of both “open” and “closed” conformations). Interestingly, our observations are in qualitative agreement with a recent study on CPTP (Rogers et al, PLOS Comp Biol, 2023), where membrane-bound CPTP is able to sample different conformations (“open” and “closed”) but not to transition between the two states in 300 ns-long MD simulations.

      * The authors motivate the study with the *assumption* that a common molecular mechanism of LTP function exists. Yet LTPs have evolved diverse sequences, structures, and substrate preferences; thus there seems to be no a priori requirement (or even necessarily a benefit) for a single molecular mechanism. What evidence then supports this premise? While previous studies are limited to individual LTPs, when viewed altogether retrospectively, they suggest features that could be shared among LTPs. Synthesizing previous studies and more thoroughly referencing them (only 5 are cited in the intro on p. 3) would strengthen both the premise and findings of the manuscript. *

      Indeed, despite having different structures, substrates and the ability to target distinct organelles, previous evidence on LTPs seem to suggest a potential role for protein conformational plasticity for function, e.g. for Osh/ORP (Jun Im et al, Nature 2005; Canagarajah et al, JMB 2008; Moser von Filseck et al, Nat Comm, 2015; Lipp et al, Nat Comm. 2019,...), StART (Arakane et al, PNAS, 1996; Feng et al, Biochemistry, 2000; Grabon et al, JBC, 2017; Khelashvili et al, eLife, 2019;...) and PITP domains (Tremblay et al, Archives of Biochemistry and Biophysics, 2005; Ryan et al, MBOC, 2007; …). Our simulations provide additional evidence in this direction and allow for generalizing these observations, allowing to draw parallelisms with “enzyme-like” or transporter-like” features that could be exploited for further design of testable hypotheses. We will rewrite our text to better contextualize/acknowledge previous findings and to clarify these points.

      *The LTPs investigated are known to target distinct membranes. Should they then be expected to share structural or sequence-based features predictive of membrane binding interfaces, as motivates the analysis in Fig 1d, 1e, and S3? Or is it beneficial for LTPs to recognize membranes in different ways? *

      Since membrane binding is membrane/organelle-specific, it is possible that residue’s diversity in membrane binding interfaces could indeed be beneficial for this diversity. We will add this comment as a potential explanation of our finding of a lack of conserved sequence-based features for membrane binding interfaces.

      *

      Minor comments:*

      * 2 "making lipid transfer across the cytoplasm a potentially energetically favorable process": Is it meant that it is less energetically costly than transfer without a LTP? Why it would be energetically favorable is unclear (and would indicate that the LTP sequesters lipids away from membranes instead of transferring them between membranes). *

      Yes, this is what we meant. We will rewrite this appropriately.

      * 3 "The excellent agreement between the membrane interface determined from the simulations and the experimentally-proposed one available for... Osh6" is missing a citation. *

      We have now added the relevant citation.

      * The plots in Fig 1d and S3 are difficult to interpret. Bar plots, for example, would allow easier comparison and evaluation. Currently, it seems that most proteins individually exhibit some of the same trends observed among the whole set, counter to the conclusion on p 5. *

      We will improve the presentation of our Figures.

      * Negatively charged residues engage in a number of membrane interactions (Fig 1d and S3). What is a potential explanation for this unconventional observation? *

      One possible interpretation is that negatively charged residues could interact with positively charged moieties (ethanolamine, choline) of PC and PE lipids.

      * How much variance is captured by PC1, and how many PCs are needed to capture most of the variance in the conformations? *

      PC1 explains 38 % of the total variance, by average, whereas PC2 accounts for 17 % of it. Therefore, PC1 and PC2 capture most of the variance in almost all cases.

      We have also added this to the text:

      "____We specifically focused on PC1 as it explains most of the variance in the dynamics (38% on average for all the proteins in our dataset, see Supplementary Table 2).____ "

      We have computed this variance and we have added this analysis in Supplementary Information.

      * Plots in Fig 3, especially panels c and d are difficult to see. Please make the panels larger (perhaps a 3 x 4 layout instead of 2 x 6 would work better). *

      We will improve the presentation of our Figures.

      * 8 "these conformational changes are localized in protein regions that interact with the lipid bilayer" is contradicted by the results in Fig 2b showing that all residues with large contributions to PC1 do not interact with the membrane and discussed on p 5. *

      As discussed above, we don’t observe “correlation” between membrane binding and conformational plasticity, but we rather observe that membrane binding regions display high conformational plasticity (the opposite is not true). We will further clarify in the text.

      *

      8 "in the absence of bound lipids, it is able to sample multiple conformations" is not supported by the orange distributions in Fig 3d that appear unimodal. Is it instead meant that the apo form exhibits larger variance in cavity volume? *

      Yes, this is what we meant. We’ll clarify.

      *

      Please clarify if the elastic network was constructed to maintain the holo or apo structures of each protein and if a bound lipid was used in the CG simulations. *

      For membrane binding CG simulations, we used the apo structure and no bound lipid was used in the simulations. However, analogous simulations in the holo form (not shown) have essentially identical membrane binding interfaces.

      *

      Was *CHARMM* TIP3P used? *

      Yes.

      * Please clarify how membrane interacting residues were defined and how interaction frequency was calculated from the longest duration of interaction. *

      We will add this explanation in the Methods. The method is identical to (Srinivasan et al, Faraday Discussion, 2021).

      * Refs 16 and 45 refer to the same paper. *

      Thanks, it is now corrected!

      * Reviewer #2 (Significance (Required)): *

      * General assessment: *

      * The work aims to tackle a grand question regarding membrane homeostasis mechanisms-what are universal principles underlying LTP function-and offers initial insights; however, further evidence is needed to support the conclusions as written, and some key results require further investigation and explanation. *

      *Advance and audience: *

      *

      By concurrently investigating the largest number of lipid transfer proteins to-date, the authors provide data invaluable for uncovering general mechanisms of non-vesicular lipid transport and advancing our understanding of membrane homeostasis mechanisms. By illuminating the wide-spread importance of conformational plasticity among lipid transfer proteins, the work presents a conceptual advance in our understanding of lipid transfer mechanisms and unifies previous studies. Because the manuscript emphasizes common biophysical principles and draws connections to enzyme biophysics, it ought to be of interest not only to membrane biologists but biochemists and molecular biologists more broadly.*

      We thank Reviewer #2 for the very positive evaluation of the significance of our work and for the in-depth analysis provided that will certainly help improve the quality of our work.

      Reviewer #3* (Evidence, reproducibility and clarity (Required)): *

      *The article "Conformational dynamics of lipid transfer domains provide a general framework to decode their functional mechanism." by Sriraksha Srinivasan, Andrea DiLuca, Arun Peter, Charlotte Gehin, Museer Lone, Thorsten Hornemann, Giovanni D'Angelo and Stefano Vanni study the interaction of Lipid transport Domains with membranes. This is done mainly by molecular modelling but also with selected experimental validations. *

      * Major comments: *

      * - The key conclusions are generally well supported by the analysis. - The authors could however analyze in more details some aspects in which specific cases appear. For example, p3 "multiple binding and unbinding events, as shown by the minimum distance curves" does not give an entire description of the variability seen in Fig S1, e.g. LCN1 versus GM2A.*

      We now discuss in more detail the variability seen in Fig. S1 and attribute it to different membrane binding affinities of the proteins in our dataset. We also discuss how this variability could reflect the diversity of organellar membranes to which these proteins bind in vivo.

      "____Notably, the proteins in our dataset display distinct binding affinities, with some proteins showing very transient binding while others remain membrane-bound for most of the simulation trajectory (Fig. S1). This behavior could be, in part, attributed to the wide diversity of organellar membranes to which the LTDs in our dataset bind to in vivo, and to the comparative simplicity of our in silico model DOPC lipid bilayers."

      • Later the "excellent agreement" for the data in Fig S2 is not quantified which does not allow the reader to know whether it better than would have been with other methods (SASA, OPM, DREAM). *

      We have explicitly quantified this agreement by providing a direct comparison between the experimental results and our in silico assay, and we further compared it against two alternative methods: OPM and DREAMM. In detail, we have identified 12 experimentally-characterized spots suggested to be involved in membrane binding in our protein dataset (see shaded blue regions in Fig. S2). Of those 12, our method identifies all of them (100%), while DREAMM identifies 7 of them (58 %) and OPM 4 out of 8 (50 %), since of the 12 proteins we tested, only 7 are available in the OPM database. Overall, even if our approach is much noisier than the others, and thus suggesting multiple binding regions that are not currently supported by experimental observations, using physics-based methodologies appears to remain a preferable strategy to characterize the binding of peripheral proteins to lipid bilayers. Given the limited size of our dataset, we prefer not to make a direct comparison between our assay and OPM/DREAMM in the main text as this won't be representative of the various methodologies.

      *p5 commenting on Fig2b the case of Osh6 that appears to disagree should probably be mentioned. *

      We now discuss this case, and attribute to this disagreement to insufficient sampling for the peculiar case of Osh6:

      "____One interesting exception in our database appears to be Osh6, where the experimentally determined membrane-binding region at the N-terminus (https://doi.org/10.1038/s41467-019-11780-y) is only marginally binding to the lipid bilayer in silico and it also appears to have limited contribution to PC1. However, our simulations are unable to sample the large conformational changes that the N-terminal lid of Osh6 has been proposed to undergo from its lipid-bound to its apo state, indicating that insufficient sampling could be the reason for this apparent discrepancy."

      *

      -The data and the methods are generally well presented allowing to be reproduced.

      • The experiments adequately replicated with adequate statistical analysis. *

      * Minor comments: *

      * - When presenting the dataset the authors could probably detail a bit more the protocol undertaken to chose the cases. In particular it is unclear whether the chosen proteins have any membrane selectivity, which in principle could be affected by the choice of lipid used here.*

      We have now added in Table 1 a column with a list of potential organelles the different LTPs have been shown to localize to (source: UniProt). As model membrane bilayer, we opted to use a pure DOPC bilayer, for both simplicity and to compare membrane binding in a uniform setting. We foresee that future studies investigating the membrane specificity of the various proteins will shed further light into the molecular mechanism of LTPs. Finally, we also indicate that our choice of proteins was mainly driven by the availability of lipid-bound structures in the protein data bank. We have added the following sentences in the main text:

      "____Specifically, we selected all LTPs for which a crystallographic structure in complex with a lipid was available at the start of our project, plus two additional proteins (GM2A and LCN1) to increase the structural diversity of our dataset (Fig. 1a)"

      and

      "____Notably, the proteins in our dataset display distinct binding affinities, with some proteins showing very transient binding while other remain membrane-bound for most of the simulation trajectory (Fig. S1). This behavior could be, in part, attributed to the wide diversity of organellar membranes to which the LTDs in our dataset bind to in vivo, and to the comparative simplicity of our in silico model DOPC lipid bilayers."

      *- The authors could probably give some indication of how much of the variance is explained by PC1 and comment briefly on the choice to ignore other PCs. *

      PC1 explains 38 % of the total variance, on average. This means that PC1 has a large contribution to the variance, especially in comparison to the other PCs. For instance, PC2 only accounts for 17 % of the total variance. This is the reason we limited our discussion to PC1. We have added a table in supplementary Information quantifying the variance explained by PC1 and PC 2 and added the following sentence in the main text:

      "____We specifically focused on PC1 as it explains most of the variance in the dynamics (38% on average for all the proteins in our dataset)____. "

      * - When analyzing the residues involved in the interaction with the membrane the results could probably be compared with that of the systematic analysis performed recently: Tubiana, T., Sillitoe, I., Orengo, C., & Reuter, N. (2022). Dissecting peripheral protein-membrane interfaces. PLOS Computational Biology, 18(12), e1010346. *

      We have added in the text a reference to the work by Tubiana et al and we have further stressed that our results agree with previous observations (including theirs). This includes the preference for Lys over Arg and the importance of protruding hydrophobes:

      "____Concomitant analysis of all LTDs (Fig. 1d) indicates that the membrane binding interface of LTDs is enriched in the positively charged amino acid Lysine, as this amino acid is less membrane-disruptive than Arginine22, and aromatic/hydrophobic ones (Phe, Leu, Val, Ile). This confirms previous observations, as (i) binding of negatively charged lipids via positively charged residues and (ii) hydrophobic insertions are two of the main mechanisms involved in membrane binding by peripheral proteins22-27."

      * - In the discussion on allostery/conformational selection might not be centered so much on enzymes. *

      We thank the reviewer for this important observation. We have now included in the Discussion the following paragraph that provides additional references and discussion of membrane transporters and receptors.

      "____Notably, the conformational plasticity we observe for LTPs is reminiscent of other, previously described, functional protein mechanisms, including enzyme dynamics during catalysis (____DOI: 10.1126/science.1066176____), the alternating-access model of membrane transporters (____https://doi.org/10.1038/nsmb.3179____) or GPCR dynamics (____https://doi.org/10.1021/acs.chemrev.6b00177____). In all these cases, protein dynamics is strongly coupled to ligand binding and protein function, be it for signaling, transport or enzymatic activity. Unlike for these fields, however, the contribution of structural and spectroscopic studies to uncover LTP dynamics remains quite limited, and our simulations provide an important contribution to fill this gap. We hope that our results will motivate researchers to increase efforts to experimentally quantify LTPs conformational plasticity, e.g. by structural determination of LTPs in different states (or bound to different lipids) or by single-molecule spectroscopy studies."

      * Reviewer #3 (Significance (Required)): *

      *

      The article shows convincing results on the debated issue of the mechanism of lipid transport by lipid transfer proteins. *

      First the study employs molecular modelling to allow a rather large test on 12 cases. The molecular dynamics experiments allow the authors to draw clear hypotheses on role of protein dynamics on the interaction with membranes and the effect on bound lipids on the modification of this dynamics.

      *Then the authors use this knowledge to design experiments that largely confirm those hypotheses. The results should therefore be interesting for a large audience of biochemists and cell biologists interested in lipid transport in the cell. *

      We thank Reviewer #3 for its very positive evaluation and contextualization of our work.

    1. Since the family is the site where biology,society and psychology converge mostevidently, Freud's rooting of sexuality in adeterminate way in the family makesperfect sense. Sexual desire may indeed bedeeply structured by infantile experience,internal conflicts not fully resolved, andrepressions of instincts in early life. Butthe drive model also has blind spots: itobscures the importance of later develop-ment and adult experience, understatesthe impact of the social milieu that shapesthose experiences, and retains a telos ofnormal sexual development, even as itexpands the meaning of the word "sexual".In the final analysis, it can be argued thatFreud rendered nature partly social, movingbeyond the biological determinism ofsexology to begin to understand howdesire is constituted intersubjectively. Butlacking a theory of social structure beyondthe family, the drive model of sexualitytended to downplay the actual links betweensocial structure and sexual behavior.

      Some people believe that our feelings about love and our bodies are shaped when we are very young and that can affect us as we grow up. But this idea only focuses on the family and doesn't think about how other things and experiences in life can also shape our feelings. So, it's important to remember that there are many things that can make us feel different and that how we feel is not only because of our family.

    Annotators

  6. Jun 2023
    1. Author Response:

      Reviewer #1 (Public Review):

      […] The major strength of the study is the elegant and well-powered data set. Longitudinal data on this scale is very difficult to collect, especially with patient cohorts, so this approach represents an exciting breakthrough. Analysis is straightforward and clearly presented. However, no multiple comparison correction is applied despite many different tests. While in general I am not convinced of the argument in the citation provided to justify this, I think in this case the key results are not borderline (p<0.001) and many of the key effects are replications, so there are not so many novel/exploratory hypothesis and in my opinion the results are convincing and robust as they are. The supplemental material is a comprehensive description of the data set, which is a useful resource.

      The authors achieved their aims, and the results clearly support the conclusion that the AD and mean confidence in a perceptual task covary longitudinally. I think this study provides an important impact to the project of computational psychiatry.Sspecifically, it shows that the relationship between transdiagnostic symptom dimensions and behaviour is meaningful within as well as across individuals.

      Response: We thank the reviewer for their appraisal of our paper and positive feedback on the main manuscript and supplementary information. We agree with the reviewer that the lack of multiple comparison corrections can also justified by key findings being replications and not borderline significance. We have added this additional justification to the manuscript (Methods, Statistical Analyses, page 15, line 568: “Adjustments for multiple comparisons were not conducted for analyses of replicated effects”)

      Reviewer #2 (Public Review):

      […] The major strength and contribution of this study is the use of a longitudinal intervention design, allowing the investigation of how the well-established link between underconfidence and anxious-depressive symptoms changes after treatment. Furthermore, the large sample size of the iCBT group is commendable. The authors employed well-established measures of metacognition and clinical symptoms, used appropriate analyses, and thoroughly examined the specificity of the observed effects.

      However, due to the small effect sizes, the antidepressant and control groups were underpowered, reducing comparability between interventions and the generalizability of the results. The lack of interaction effect with treatment makes it harder to interpret the observed differences in confidence, and practice effects could conceivably account for part of the difference. Finally, it was not completely clear to me why, in the exploratory analyses, the authors looked at the interaction of time and symptom change (and group), since time is already included in the symptom change index.

      Response: We thank the reviewer for their succinct summary of the main results and strengths of our study. We apologise for the confusion in how we described that analysis. We examine state-dependence., i.e. the relationship between symptom change and metacognition change, in two ways in the paper – perhaps somewhat redundantly. (1) By correlating change indices for both measures (e.g. as plotted in Figure 3D) and (2) by doing a very similar regression-based repeated-measures analysis, i.e. mean confidence ~ time*anxious-depression score change. Where mean confidence is entered with two datapoints – one for pre- and one for post-treatment (i.e. within-person) and anxious-depression change is a single value per person (between-person change score). This allowed us to test if those with the biggest change in depression had a larger effect of time on confidence. This has been added to the paper for clarification (Methods, Statistical Analysis, page 14, line 553-559: “To determine the association between change in confidence and change in anxious-depression, we used (1) Pearson correlation analysis to correlate change indices for both measures and, (2) regression-based repeated-measures analysis: mean confidence ~ time*anxious-depression score change, where mean confidence is entered with two datapoints (one for pre- and one for post-treatment i.e., within-person) and anxious-depression change is a single value per person (between-person change score)”).

      The analyses have also been reported as regression in the Results for consistency (Treatment Findings: iCBT, page 5, line 197-204: ‘To test if changes in confidence from baseline to follow-up scaled with changes in anxious-depression, we ran a repeated measure regression analyses with per-person changes in anxious-depression as an additional independent variable. We found this was the case, evidenced by a significant interaction effect of time and change in anxious-depression on confidence (b=-0.12, SE=0.04, p=0.002)… This was similarly evident in a simple correlation between change in confidence and change in anxious-depression (r(647)=-0.12, p=0.002)”).

      This longitudinal study informs the field of metacognition in mental health about the changeability of biases in confidence. It advances our understanding of the link between anxiety-depression and underconfidence consistently found in cross-sectional studies. The small effects, however, call the clinical relevance of the findings into question. I would have found it useful to read more in the discussion about the implications of the findings (e.g., why is it important to know that the confidence bias is state-dependent; given the effect size of the association between changes in confidence and symptoms, is the state-trait dichotomy the right framework for interpreting these results; suggestions for follow-up studies to better understand the association).

      Response: Thank you for this comment. We have elaborated on the implications of our findings in the Discussion, including the relevance of the state-trait dichotomy to future research and how more intensive, repeated testing may inform our understanding of the state-like nature of metacognition (Discussion, Limitations and Future Directions, page 10, line 378-380: “More intensive, repeating testing in future studies may also reveal the temporal window at which metacognition has the propensity to change, which could be more momentary in nature.”).

      Reviewer #3 (Public Review):

      […] I think these findings are exciting because they directly relate to one of the big assumptions when relating cognition to mental health - are we measuring something that changes with treatment (is malleable), so might be mechanistically relevant, or even useful as a biomarker?

      This work is also useful in that it replicates a finding of heightened confidence in those with compulsivity, and lowered confidence in those with elevated anxious-depression.

      One caveat to the interest of this work is that it doesn't allow any causal conclusions to be drawn, and only measures two timepoints, so it's hard to tell if changes in confidence might drive treatment effects (but this would be another study). The authors do mention this in the limitations section of the paper.

      Another caveat is the small sample in the antidepressant group.

      Some thoughts I had whilst reading this paper: to what extent should we be confident that the changes are not purely due to practice? I appreciate there is a relationship between improvement in symptoms and confidence in the iCBT group, but this doesn't completely rule out a practice effect (for instance, you can imagine a scenario in which those whose symptoms have improved are more likely to benefit from previously having practiced the task).

      Response: We thank the reviewer for commenting on the implications of our findings and we agree with the caveats listed. We thank the reviewer for raising this point about practice effects. A key thing to note is that this task does not have a learning element with respect to the core perceptual judgement (i.e., accuracy), which is the target of the confidence judgment itself. While there is a possibility of increased familiarity with the task instructions and procedures with repeated testing, the task is designed to adjust the difficulty to account of any improvements, so accuracy is stable. We see that we may not have made this clear in some of our language around accuracy vs. perceptual difficulty and have edited the Results to make this distinction clearer (Treatment Findings: iCBT, pages 4-5, lines 184-189: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved. This was reflected as the overall increase in task difficulty to maintain the accuracy rates from baseline (dot difference: M=41.82, SD=11.61) to follow-up (dot difference: M=39.80, SD=12.62), (b=-2.02, SE=0.44, p<0.001, r2\=0.01)”.)

      However, it is true that there can be a ‘practice’ effect in the sense that one may feel more confident (despite the same accuracy level) due to familiarity with a task. One reason we do not subscribe to the proposed explanation for the link between anxious-depression change and confidence change is that the other major aspect of behaviour that improved with practice did so in a manner unrelated to clinical change. As noted above in the quoted text, participants’ discrimination improved from baseline to follow-up, reflected in the need for higher difficulty level to maintain accuracy around 70%. Crucially, this was not associated with symptom change. This speaks against a general mechanism where symptom improvement leads to increased practice effects in general. Only changes in confidence specifically are associated with improved symptoms. We have provided more detail on this in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up.”).

      Relatedly, to what extent is there a role for general task engagement in these findings? The paper might be strengthened by some kind of control analysis, perhaps using (as a proxy for engagement) the data collected about those who missed catch questions in the questionnaires.

      Response: Thank you for your comment. We included the details of data quality checks in the Supplement. Given the small number of participants that failed more than one attention checks (1% of the iCBT arm) and that all those participants passed the task exclusion criteria, we made the decision to retain these individuals for analyses. We have since examined if excluding these small number of individuals impacts our findings. Excluding those that failed more than one catch item did not affect the significance of results, which has now been added to the Supplementary Information (Data Quality Checks: Task and Clinical Scales, page 5, lines 181-185: “Additionally, excluding those that failed more than one catch item in the iCBT arm did not affect the significance of results, including the change in confidence (b=0.16, SE=0.02, p<0.001), change in anxious-depression (b=-0.32, SE=0.03, p<0.001), and the association between change in confidence and change in anxious-depression (r(638)=-0.10, p=0.011)”).

      I was also unclear what the findings about task difficulty might mean. Are confidence changes purely secondary to improvements in task performance generally - so confidence might not actually be 'interesting' as a construct in itself? The authors could have commented more on this issue in the discussion.

      Response: Thank you for this comment and sorry it was not clear in the original paper. As we discussed in a prior reply, accuracy – i.e. proportion of correct selections (the target of confidence judgements) are different from the difficulty of the dot discrimination task that each person receives on a given trial. We had provided more details on task difficulty in the Supplement. Accuracy was tightly controlled in this task using a ‘two-down one-up’ staircase procedure, in which equally sized changes in dot difference occurred after each incorrect response and after two consecutive correct responses. The task is more difficult when the dot difference between stimuli is lower, and less difficult when the dot difference between stimuli is greater. Therefore, task difficulty refers to the average dot difference between stimuli across trials. Crucially, task accuracy did not change from baseline to follow-up, only task difficulty. Moreover, changes in task difficulty were not associated with changes in anxious-depression, while changes in confidence were, indicating confidence is the clinically relevance construct for change in symptoms.

      We appreciate that this may not have been clear from the description in the main manuscript, and have added more detail on task difficulty to the Methods (Metacognition Task, page 14, lines 540-542: “Task difficulty was measured as the mean dot difference across trials, where more difficult trials had a lower dot difference between stimuli.”) and Results (Treatment Findings: iCBT, pages 4-5, lines 184-186: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved.”). We have also elaborated more on how improvements in symptoms are associated with change in confidence, not task performance in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up”).

      To make code more reproducible, the authors could have produced an R notebook that could be opened in the browser without someone downloading the data, so they could get a sense of the analyses without fully reproducing them.

      Response: Thank you for your comment. We appreciate that an R notebook would be even better than how we currently share the data and code. While we will consider using Notebooks in future, we checked and converting our existing R script library into R Notebooks would require a considerable amount of reconfiguration that we cannot devote the time to right now. We hope that nonetheless the commitment to open science is clear in the extensive code base, commenting and data access we are making available to readers.

      Rather than reporting full study details in another publication I would have found it useful if all relevant information was included in a supplement (though it seems much of it is). This avoids situations where the other publication is inaccessible (due to different access regimes) and minimises barriers for people to fully understand the reported data.

      Response: We agree this is good practice – the Precision in Psychiatry study is very large, with many irrelevant components with respect to the present study (Lee et al., BMC Psychiatry, 2023). For this reason, we tried to provide all that was necessary and only refer to the Precision in Psychiatry study methods for fine-grained detail. Upon review, the only thing we think we omitted that is relevant is information on ethical approval in the manuscript, which we have now added (Methods, Participants, page 11, lines 412-417: “Further details of the PIP study procedures that are not specific to this study can be found in a prior publication (21). Ethical approval for the PIP study was obtained from the Research Ethics Committee of School of Psychology, Trinity College Dublin and the Northwest-Greater Manchester West Research Ethics Committee of the National Health Service, Health Research Authority and Health and Care Research Wales”). If any further information is lacking, we are happy to include it here also.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer 1____: __

      1-Localization of ESYT1 and SYNJ2BP

      The claim of a localization at ER-mitochondria contacts relies on two type of assays. Light microscopy and subcellular fractionation. Concerning microscopy, while the staining pattern is obviously colocalizing with the ER (a control of specificity of staining using KO cells would nevertheless be desirable)

      the idea that ESYT1 foci "partially colocalized with mitochondria" is either trivial or unfounded

      Every cellular structure is "partially colocalized with mitochondria" simply by chance at the resolution of light microscopy

      If the meaning of the experiment is to show that ESYT1 'specifically' colocalizes with mitochondria, then this isn't shown by the data

      There is no quantification that the level of colocalization is more than expected by chance

      nor that it is higher than that of any other ER protein

      Moreover, the author's model implies that ESYT1 partial colocalization with mitochondria is, at least partially, due to its interaction with SYNJ2BP. This is not tested.

      • To analyze and measure MERCs parameters and functions, we used a set of validated methods described in the following specialized review articles (Eisenberg-Bord, Shai et al. 2016, Scorrano, De Matteis et al. 2019).
      • To support and confirm the localization of ESYT1-SYNJ2BP complex at MERCs, we performed supplementary BioID analysis using ER target BirA*, OMM targeted BirA* and ER-mitochondria tether BirA* (Table S1, Figure S1 and Figure 1 A and B). These results confirmed the specificity of the interaction of the 2 partners. ESYT1 is not identified as a prey in OMM BioID and SYNJ2BP is not identified in ER BioID, on the other hand both partners are identified in the ER-mitochondria tether BioID.
      • To improve our description of the partial localization of ESYT1 at mitochondria, we performed a quantitative analysis using confocal microscopy on control human fibroblasts stably overexpressing SEC61B-mCherry as an ER marker which were labelled with ESYT1 and TOMM40 for mitochondria. We measured the % of ESYT1 signal colocalizing with mitochondria and the % of mitochondria positive for ESYT1 (Figure 1E).
      • To demonstrate than ESYT1 partial colocalization with mitochondria is, at least partially, due to its interaction with SYNJ2BP, we performed a quantitative analysis using confocal microscopy. Human control fibroblasts, KO SYNJ2BP fibroblasts and SYNJ2BP overexpressing fibroblasts were labelled with ESYT1, TOMM40 for mitochondria and CANX for ER. We measured the % of ESYT1 signal colocalizing with mitochondria in each condition (Figure 3C). Membranes (MAM) can be purified and are enriched for proteins that localize at ER-mitochondria contacts. This idea originated in the early 90's and since then, myriad of papers has been using MAM purification, and whole MAM proteomes have been determined. Yet the evidence that MAM-enriched proteins represent bona fide ER-mitochondria-contact-enriched proteins (as can nowadays be determined by microscopy techniques) remain scarce. Here, anyway, ESYT1 fractionation pattern is identical to that of PDI, a marker of general ER, with no indication of specific MAM accumulation.

      • To highlight the enrichment of ESYT1 in the MAM fraction, we quantified the ESYT1 signal in each fraction. Those results show a similar fractionation pattern than the MAM resident protein SIGMAR1 (Figure 1F). For SYNJ2BP, it is different as it is more enriched in the MAM than the general mitochondrial marker PRDX3. However, PRDX3 is a matrix protein, making it a poor comparison point, since SYNJ2BP is an OMM protein.

      • To confirm the partial enrichment of SYNJ2BP in the MAM fraction compared to another outer mitochondrial membrane protein, we added the signal of the well characterized OMM protein CARD19 (Rios, Zhou et al. 2022). Again, the model implies that ESYT1 and SYNJ2BP accumulation in the MAM should be dependent on each other. This is not tested.

      • As describe above, we demonstrated in Figure 3C than the accumulation of ESYT1 at mitochondria is, at least partially, dependent on the quantity of SYNJ2BP.

      • We moreover showed a reciprocal effect in Figure 3E. A quantitative analysis using confocal microscopy demonstrated that the effect of SYNJ2BP overexpression on MERCs formation is partially dependent of the presence of ESYT1. 2-ESYT1-SYNJ2BP interaction.

      The starting point of the paper is a BioID signal for SYNJ2BP when BioID is fused to ESYT1. One confirmation of the interaction comes in figure 4, using blue native gel electrophoresis and assessing comigration. Because BioID is promiscuous and comigration can be spurious, better evidence is needed to make this claim. This is exemplified by the fact that, although SYNJ2BP is found in a complex comigrating with RRBP1, according to the BN gel, this slow migrating complex isn't disturbed by RRBP1 knockdown, but is somewhat disturbed by ESYT1 knockdown. More than a change in abundance, a change in migration velocity when either protein is absent would be evidence that these comigrating bands represent the same complex.

      • We showed in Figure 4C that the presence of SYNJ2BP in a complex of a similar molecular weight that ESYT1 (410KDa) is totally dependent of the presence of ESYT1, suggesting an interaction of the 2 proteins.
      • To confirm this interaction, in figure 4A we analyzed on BN cells overexpressing SYNJ2BP together with a 3xFlag tagged version of ESYT1. As a result of the addition of the Flag tag, the complex positive for ESYT1 shifted to a higher molecular weight. The complex positive for SYNJ2BP shifted to a similar the molecular weight, demonstrating the interaction and dependence of the 2 partners. ESYT1-SYNJ2BP interaction needs to be tested by coimmunoprecipitation of endogenous proteins, yeast-2-hybrid, in vitro reconstitution or any other confirmatory methods.

      • To confirm the interaction of the 2 partners, we performed co-immunoprecipitation of the ESYT1-3xFlag protein that we showed in Figure 1H to form complexes similar to the endogenous protein. SYNJ2BP is found as the strongest prey, followed by ESYT2 and SEC22B two described interactors of ESYT1, confirming the quality of the analysis (Table S2) (Giordano, Saheki et al. 2013, Gallo, Danglot et al. 2020). 3-Tethering by ESYT1- SYNJ2BP.

      This is assessed by light and electron microscopy. Absence of ESYT1 decreases several metrics for ER-mitochondria contacts (whether absence of SYNJ2BP has the same effect isn't tested).

      • Using PLA (proximity ligation assay) we demonstrated that the loss of SYNJ2BP leads to a decrease in MERCs (Figure 7 H and I), confirming previous studies (Ilacqua, Anastasia et al. 2022, Pourshafie, Masati et al. 2022). This interesting phenomenon could be due to many things, including but not limited to the possibility that "ESYT1 tethers ER to mitochondria".

      This statement and the respective subheading title are therefore clearly overreaching and should be either supported by evidence or removed.

      Indeed, absence of ESYT1 ER-PM tethering and lipid exchange could have knock-on effects on ER-mito contacts, therefore strong statements aren't supported.

      Moreover, the effect on ER-mitochondria contact metrics could be due to changes in ER-mitochondria contact indeed but may also reflect changes in ER and/or mitochondria abundance and/or distribution, which favour or disfavour their encounter. Abundance and distribution of both organelles are not controlled for.

      • The mitochondrial phenotypes caused by the loss of ESYT1 are all rescued by the introduction of an artificial mitochondrial-ER tether, demonstrating that they are due to loss of the tethering function of ESYT1. Finally, the authors repeat a finding that SYNJ2BP overexpression induces artificial ER-mitochondria tethering. Again, according to the model, this should be, at least in part, due to interaction with ESYT1. Whether ESYT1 is required for this tethering enhancement isn't tested.

      • As described above, we demonstrated in Figure 3C that the accumulation of ESYT1 at mitochondria is, at least partially, dependent on the quantity of SYNJ2BP.

      • We moreover showed a reciprocal effect in Figure 3F. A quantitative analysis using confocal microscopy demonstrated that the effect of SYNJ2BP overexpression on MERC formation is partially dependent of the presence of ESYT1. 4-Phenotypes of ESYT1/SYNJ2BP KD or KO.

      The study goes in details to show that downregulation of either protein yields physiological phenotypes consistent with decreased ER-mitochondria tethering. These phenotypes include calcium import into mitochondria and mitochondrial lipid composition.

      Figure 5 shows that histamine-evoked ER-calcium release cause an increase in mitochondrial calcium, and this increase is reduced in absence of ESYT1, without detectable change in the abundance of the main known players of this calcium import. This is rescued by an artificial ER-mitochondria tether. However, Figure 5D shows that the increase in calcium concentration in the cytosol upon histamine-evoked ER calcium release is equally impaired by ESYT1 deletion, contrary to expectation. Indeed, if the impairment of mitochondrial calcium import was due to improper ER-mitochondria tethering in ESYT1 mutant cells, one would expect more calcium to leak into the cytosol, not less.

      The remaining explanation is that ESYT1 knockout desensitizes the cells to histamine, by affecting GPCR signalling at the PM, something unexplored here.

      In any case, a decreased calcium discharge by the ER upon histamine treatment, explains the decreased uptake by mitochondria.

      The authors argue that ER calcium release is unaffected by ESYT1 KO, but crucially use thapsigargin instead of histamine to show it. Thus, the most likely interpretation of the data is that ESYT1 KO affects histamine signalling and histamine-evoked calcium release upstream of ER-mitochondria contacts.

      • Silencing ESYT1 impairs SOCE efficiency in Jurkat cells (Woo, Sun et al. 2020), but not in HeLa cells (Giordano, Saheki et al. 2013, Woo, Sun et al. 2020). Analysis of the role of ESYT1 in HeLa cells prevents confounding effects due to the loss of ESYT1 at ER-PM. In this model, knock-down of ESYT1 led to a decrease of mitochondrial Ca2+ uptake from the ER upon histamine stimulation, as monitored by genetically encoded Ca2+ indicator targeted to mitochondrial matrix (Figure 5A and B). ESYT1 silencing in HeLa cells did not impact ER Ca2+ store measured by the ER-targeted R-GECO Ca2+ probe (Figure 5C and D). The expression of the artificial mitochondria-ER tether was able to rescue mitochondrial Ca2+ defects observed in ESYT1 silenced cells (Figure 5B), confirming that the observed anomalies are specifically due to MERC defects.
      • In contrast loss of ESYT1 impaired SOCE efficiency in fibroblasts (Figure 6 A and B). This phenotype was fully rescued by re-expression of ESYT1-Myc but not the artificial tether. We therefore investigated the influence of ESYT1 loss on cytosolic Ca2+ concentration following ATP (Figure 6F to H) or histamine stimulation (Figure S3 D to F), both of which showed a reduced cytosolic Ca2+ concentration and uptake in ESYT1 KO cells. This phenotype was fully rescued by the re-expression of ESYT1-Myc but not the artificial tether. Measurment of cytosolic Ca2+ after tharpsigargin treatment in Ca2+-fee media, an inhibitor of the sarco/endoplasmic reticulum Ca2+ ATPase SERCA that blocks Ca2+ pumping into the ER, showed that ESYT1 KO does not influence the total ER Ca2+ pool (Figure 6K and L). However, ER-Ca2+ release capacity upon histamine stimulation (Figure 6I and J) is decreased in ESYT1 KO cells. This phenotype was fully rescued by the re-expression of ESYT1-Myc but not the artificial tether. Loss of ESYT1 decreased the Ca2+ uptake capacities of mitochondria after activation with histamine (Figure S3 A to C) or ATP (Figure 6 C to E). This phenotype was rescued by re-expression of ESYT1-Myc and also the engineered ER-mitochondria tether. Thus, despite the ER-Ca2+ release defect observed after ESYT1 loss, the artificial tether fully rescued the mitochondrial phenotype.
      • These results highlight the distinct and dual roles of ESYT1 in Ca2+ regulation at the ER-PM and at MERCs. The data with SYNJ2BP deletion are more compatible with decreased ER-mito contacts, as no decreased in cytosolic calcium is observed. This is compatible with the previously proposed role of SYNJ2BP in ER-mitochondria tethering, but the difference with ESYT1 rather argue that both proteins affect calcium signaling by different means, meaning they act in different pathways.

      • We explain the different results concerning cytosolic calcium by the fact that ESYT1 is a bi-localized protein with dual functions on cellular calcium. Implicated both in SOCE at ER-PM and in mitochondrial calcium uptake at MERCs. On the other hand, SYNJ2BP is only present at MERCs and its loss do not influence PM-ER signaling or ER-Ca2+ release. Finally, the study delves into mitochondrial lipids to "investigated the role of the SMP-domain containing protein ESYT1 in lipid transfer from ER to mitochondria". In reality, it is not ER-mitochondria lipid transport that is under scrutiny, but general lipid homeostasis, and changes in ER-PM lipids could have knock-on effects on mitochondrial lipids without the need to invoke disruptions in ER-mitochondria transfer activity.

      • The fact that the artificial tether, which specifically rescue MERCs, fully rescue the lipid phenotype argue for a direct loss of MERCs tethering function when ESYT1 is missing. The changes observed are interesting but could be due to anything. Surprisingly, PCA analysis shows that the rescue of the knockout by the ESYT1 gene clusters with the rescue by the artificial tether, and not with the wildtype. This indicates that overexpressing either ESYT1 or a tether cause similar lipidomic changes. These could be due, for instance, to ER stress caused by protein overexpression, and not to a rescue.

      • In order to verify if the overexpression of ESYT1 or the artificial tether induces ER stress, we performed a WB analysis to compare markers of ER stress in control fibroblasts, KO ESYT1 fibroblasts, KO ESYT1 fibroblasts overexpressing ESYT1-Myc or the tether (Figure S4C). This showed no changes in the levels of several different markers of ER stress or cell death. __Reviewer 2____: __

      1) the interaction between those proteins is direct,

      2) if SYNJ2BP is necessary and sufficient to localize E-Syt1 at MERC, and

      3) if MERCs extension induced by SYNJ2BP is dependent on E-Syt1.

      Those points are important to investigate because SYNJ2BP has already been shown to induce MERCs by interacting with the ER protein RRBP1. In addition, some experiments need to be better quantified.

      Major comments: E-syt1/SYNJ2BP in MERCs formation: the authors provide several convincing lines of evidence that both proteins are in the same complex (proximity labelling, localization in the same complex in BN-PAGE, localization in MAM) but it is not clear in which extent the direct interaction between both proteins regulates ER-mitochondria tethering. 1- Pull down experiments or BiFC strategy could be performed to show the direct interaction between both proteins.

      • We showed in Figure 4C that the presence of SYNJ2BP in a complex of a similar molecular weight to that ESYT1 (410KDa) is totally dependent of the presence of ESYT1, suggesting an interaction of the 2 proteins.
      • To confirm this interaction, in figure 4A we analyzed on BN cells overexpressing SYNJ2BP together with a 3xFlag tagged version of ESYT1. As a result of the addition of the Flag tag, the complex positive for ESYT1 shifted to a higher molecular weight. Significantly, the complex positive for SYNJ2BP shifted to a similar the molecular weight, demonstrating the interaction and dependence of the 2 protein partners.
      • To confirm the interaction of the 2 partners, we performed co-immunoprecipitation of the ESYT1-3xFlag protein (Table S2). SYNJ2BP was found as the strongest prey, followed by ESYT2 and SEC22B two described interactors of ESYT1, confirming the quality of the analysis (Giordano, Saheki et al. 2013, Gallo, Danglot et al. 2020). 2- SYNJ2BP OE has already been demonstrated to increase MERCs and this being dependent on the ER binding partners RRBP1 (10.7554/eLife.24463). Therefore, it would be of interest to perform OE of SYNJ2BP in KO Esyt1 to address the question of whether ESyt1 is also required to increase MERCs.

      • A quantitative analysis using confocal microscopy demonstrated that the effect of SYNJ2BP overexpression on MERCs formation is partially dependent of the presence of ESYT1 (Figure 3F). 3- The authors show that Esyt1 punctate size increases when SYNJ2BP is OE (Fig3C), but this can be indirectly linked to the increase of MERCs in the OE line. Thus, it could be interesting to test if the number/shape of E-syt1 punctate located close to mitochondria decreases in KO SYNJ2B. This could really show the dependence of SYNJ2BP for E-syt1 function at MERCs.

      • To improve our description of the partial localization of ESYT1 at mitochondria, we performed a quantitative analysis using confocal microscopy on control human fibroblasts stably overexpressing SEC61B-mCherry as an ER marker which were labelled with ESYT1 and TOMM40 for mitochondria. We measured the % of ESYT1 signal colocalizing with mitochondria and the % of mitochondria colocalizing with ESYT1 (Figure 1E).

      • To demonstrate than ESYT1 partial colocalization with mitochondria is, at least partially, due to its interaction with SYNJ2BP, we performed a quantitative analysis using confocal microscopy. Human control fibroblasts, KO SYNJ2BP fibroblasts and SYNJ2BP overexpressing fibroblasts were labelled with ESYT1, TOMM40 for mitochondria and CANX for ER. We measured the % of ESYT1 signal colocalizing with mitochondria in each condition (Figure 3C). Lipid analyses: the results of MS on isolated mitochondria clearly show that mitochondrial lipid homeostasis is affected on KO-Syt1 and rescued by expression of Syt1-Myc and artificial mitochondria-ER tether. However, p.15, the authors wrote "The loss of ESYT1 resulted in a decrease of the three main mitochondrial lipid categories CL, PE and PI, which was accompanied by an increase in PC ». As the results are expressed in mol%, this interpretation can be distorted by the fact that mathematically, if the content of one lipid decreases, the content of others will increase. I would suggest to express the results in lipid quantity (nmol)/mg of mitochondria proteins instead of mol%. This will clarify the role of E-Syt1 on mitochondrial lipid homeostasis and which lipid increase and decrease.

      • We changed the sentence in the text as suggested. Also it could be of high interest to have the lipid composition of the whole cells to reinforce the direct involvement of E-Syt1 in mitochondrial lipid homeostasis and verify that the disruption of mitochondrial lipid homeostasis is not linked to a general perturbation of lipid metabolism as this protein acts at different MCSs.

      • This is beyond the scope of the project and we would argue that the results of such an experiment would be difficult to interpret. To better understand the impact of Esyt1 of mitochondria morphology, the author could analyze the mitochondria morphology (size, shape, cristae) on their EM images of crt, KO and OE lines. Indeed, on OE (Fig3A), the mitochondria look bigger and with a different shape compared to crt.

      • As we do not observe obvious differences in mitochondrial morphology between control, KO and OE fibroblasts we do not think that quantitative analysis would add to the understanding of the effect of ESYT1 on mitochondrial function. Also, they performed a lot of BN-PAGE. Is it possible to check whether the mitochondrial respiratory chain super-complexes are affected on Esyt1 KO line compared to crt?

      • We decided to remove the data on the metabolic consequences of ESYT1 loss since it was too preliminary and required deeper investigations, focusing instead on the effect of ESYT1 loss on calcium homeostasis. Quantifications: some western blots needs to be quantified (Fig 5K, 6J, S3E);

      • We did not observe obvious differences in the protein levels so we think that quantitation would not add significantly to the understanding of the differences in calcium dynamics that we report. Fig1A: Can the author provide a higher magnification of the triple labeling and perform quantification about the proportion of E-Syt1 punctate located close to mitochondria?

      • We added higher magnification of the same area in all channels and arrows that point to the foci of ESYT1 colocalizing with both ER and mitochondria (Figure 1D).

      • To improve our description of the partial localization of ESYT1 at mitochondria, we performed a quantitative analysis using confocal microscopy on control human fibroblasts stably overexpressing SEC61B-mCherry as an ER marker which were labelled with ESYT1 and TOMM40 for mitochondria. We measured the % of ESYT1 signal colocalizing with mitochondria and the % of mitochondria colocalizing with ESYT1 (Figure 1E). Minor comments:

      • Fig1E + text: according to the legend, the BN-PAGE has been performed on Heavy membrane fraction. Why the authors speak about complexes at MAM in the text of the corresponding figure? Is-it the MAM or the heavy fraction (MAM + mito + ER...)? If BN have been performed from heavy membranes, it is not a real proof that E-syt1 is in MAMs.

      • Heavy membranes have been used in this experiment. The text and conclusions have been changed accordingly.

      • On fig3C (panel crt): it seems like SYNJ2BP dots are not co-localizaed with mito. Is this protein targeted to another organelle beside mitochondria?

      • It is not described that SYNJ2BP would be targeted to another organelle beside mitochondria. It is possible that those dots outside of mitochondria could be non-specific signals from the antibody we used.

      • Fig4A: can the author provide a control of protein loading (membrane staining as example) to confirm the decrease of E-Syt1 in siSYNJ2BP?

      • As we performed this experiment only once we have removed the statement suggesting a decrease in ESYT1 protein in response to the siSYNJ2BP.

      • Fig5E/F: it is not clear to me why the expression of E-Syt1 in the KO is not able to complement the KO phenotype for cytosolic Ca++. Can the authors comment this?

      • We performed further analysis using ATP to trigger calcium release from the ER (figure 6 F to H). In those conditions, expression of ESYT1 in the KO is able to complement the KO phenotype for cytosolic Ca2+. __Reviewer 3____: __

      Main points 1. Confirming the MERC localization of ESYT1 should include some more of tethering factors as demonstrated interactors (some are mentioned above) and should not be limited to lipid homeostasis.

      • As shown in Figure 1B, VAPB, PDZD8 and BCAP31 are found as preys in the ESYT1 bioID analysis. Those proteins have been described as MERC tethers, their loss leading to mitochondrial calcium defects. To support and confirm the specificity of ESYT1-SYNJ2BP complex at MERCs, we performed a supplementary BioID analysis using ER targeted BirA* and OMM targeted BirA* (Table S1, Figure S1 and Figure 1 A and B). These results confirmed the specificity of the interaction of the 2 partners. ESYT1 is not identified as a prey in OMM BioID and SYNJ2BP is not identified in ER BioID. Additional ER-mitochondria tether BirA* analyses showed that tether-BirA* identified both ESYT1 and SYNJ2BP as a prey at MERCs, confirming the localisation of this interaction. Interestingly, a large majority of the known MERCs tethers VAPB-PTPIP51, MFN2, ITPRs, BCAP31 are also found as preys in the tether-BirA* (Figure 1B), confirming the quality of these data.
      • To confirm the interaction of the 2 partners, we performed co-immunoprecipitation of the ESYT1-3xFlag protein. SYNJ2BP is found as the strongest prey, followed by ESYT2 and SEC22B two described interactors of ESYT1, confirming the quality of the analysis (Table S2) (Giordano, Saheki et al. 2013, Gallo, Danglot et al. 2020).

      The fact that in ESYT1 KO cells both mitochondrial calcium transfer and cytosolic calcium accumulation are accompanied by decreased ER-cepia1ER signal decay upon histamine addition suggest that the main reason for ER-mitochondria calcium transfer defects are due to impaired SOCE. Calcium-free medium and histamine are used to show that ESYT1 does not affect ER calcium content. However, if it affects SOCE, then the absence of extracellular calcium would abolish such an effect; moreover, histamine does not test for leak effects. As additional information, the authors should investigate whether ER calcium content is affected by the presence of extracellular calcium in the ko scenario using thapsigargin. The authors should inhibit SOCE to test whether this mechanism is affected in ESYT1 KO and could account for observed signal differences. Excluding SOCE is critical, since any change in calcium entry from the outside would potentially negate a role of ESYT1 in mitochondrial calcium uptake.

      • Silencing ESYT1 impairs SOCE efficiency in Jurkat cells (Woo, Sun et al. 2020), but not in HeLa cells (Giordano, Saheki et al. 2013, Woo, Sun et al. 2020). Analysis of the role of ESYT1 in HeLa cells prevents confounding effects due to the loss of ESYT1 at ER-PM. In this model, knock-down of ESYT1 led to a decrease of mitochondrial Ca2+ uptake from the ER upon histamine stimulation, as monitored by genetically encoded Ca2+ indicator targeted to mitochondrial matrix (Figure 5A and B). ESYT1 silencing in HeLa cells did not impact ER Ca2+ store measured by the ER-targeted R-GECO Ca2+ probe (Figure 5C and D). The expression of the artificial mitochondria-ER tether was able to rescue mitochondrial Ca2+ defects observed in ESYT1 silenced cells (Figure 5B), confirming that the observed anomalies are specifically due to MERC defects.
      • In contrast loss of ESYT1 impaired SOCE efficiency in fibroblasts (Figure 6 A and B). This phenotype was fully rescued by re-expression of ESYT1-Myc but not the artificial tether. We therefore investigated the influence of ESYT1 loss on cytosolic Ca2+ concentration following ATP (Figure 6F to H) or histamine stimulation (Figure S3 D to F), both of which showed a reduced cytosolic Ca2+ concentration and uptake in ESYT1 KO cells. This phenotype was fully rescued by the re-expression of ESYT1-Myc but not the artificial tether. Measurment of cytosolic Ca2+ after tharpsigargin treatment in Ca2+-fee media, an inhibitor of the sarco/endoplasmic reticulum Ca2+ ATPase SERCA that blocks Ca2+ pumping into the ER, showed that ESYT1 KO does not influence the total ER Ca2+ pool (Figure 6K and L). However, ER-Ca2+ release capacity upon histamine stimulation (Figure 6I and J) is decreased in ESYT1 KO cells. This phenotype was fully rescued by the re-expression of ESYT1-Myc but not the artificial tether. Loss of ESYT1 decreased the Ca2+ uptake capacities of mitochondria after activation with histamine (Figure S3 A to C) or ATP (Figure 6 C to E). This phenotype was rescued by re-expression of ESYT1-Myc and also the engineered ER-mitochondria tether. Thus, despite the ER-Ca2+ release defect observed after ESYT1 loss, the artificial tether fully rescued the mitochondrial phenotype.
      • These results highlight the distinct and dual roles of ESYT1 in Ca2+ regulation at the ER-PM and at MERCs.

      The authors claim that ER-Geco measurements show that no change of ER calcium was observed. However, they use thapsigargin treatment and then get a peak, when the signal should show a decrease due to leak. This suggests they did not use ER-Geco in Figure S3C. What was measured and what does it mean?

      • We used R-GECO (not ER-GECO) which measures the cytosolic calcium.
      • We measured total ER Ca2+ store using the cytosolic-targeted R-GECO Ca2+ probe upon thapsigarin treatment, an inhibitor of the sarco/endoplasmic reticulum Ca2+ ATPase SERCA that blocks Ca2+ pumping into the ER (Figure 5C and D) and observed no difference in our different conditions.

      The findings on growth in galactose medium are intriguing but are not accompanied by respirometry to confirm mitochondria are compromised upon ESYT1 KO.

      • We decided to remove the data on the metabolic consequences of ESYT1 loss since it was to preliminary and required deeper investigations, focusing instead on the effect of ESYT1 loss on calcium homeostasis

      Minor points: 1. The authors mention they measure mitochondrial uptake of "exogenous" calcium by applying histamine. They should specify that these measures transferred calcium from the ER rather than uptake of calcium from the exterior (directly at the plasma membrane).

      • The text was clarified as suggested.

      • Expression levels of IP3Rs are not very indicative of any change of their activity. The authors should discuss how ESYT1 could affect their PTMs.

      • A large numer of post translational modifications are known to regulate IP3R activity (Hamada and Mikoshiba 2020), and it is possible that the loss of ESYT1 could interfere with these modifications, but an exploration of this issue is beyond the scope of this study. The text was clarified as suggested. Eisenberg-Bord, M., N. Shai, M. Schuldiner and M. Bohnert (2016). "A Tether Is a Tether Is a Tether: Tethering at Membrane Contact Sites." Dev Cell 39(4): 395-409.

      Gallo, A., L. Danglot, F. Giordano, B. Hewlett, T. Binz, C. Vannier and T. Galli (2020). "Role of the Sec22b-E-Syt complex in neurite growth and ramification." J Cell Sci 133(18).

      Giordano, F., Y. Saheki, O. Idevall-Hagren, S. F. Colombo, M. Pirruccello, I. Milosevic, E. O. Gracheva, S. N. Bagriantsev, N. Borgese and P. De Camilli (2013). "PI(4,5)P(2)-dependent and Ca(2+)-regulated ER-PM interactions mediated by the extended synaptotagmins." Cell 153(7): 1494-1509.

      Hamada, K. and K. Mikoshiba (2020). "IP(3) Receptor Plasticity Underlying Diverse Functions." Annu Rev Physiol 82: 151-176.

      Ilacqua, N., I. Anastasia, D. Aloshyn, R. Ghandehari-Alavijeh, E. A. Peluso, M. C. Brearley-Sholto, L. V. Pellegrini, A. Raimondi, T. Q. de Aguiar Vallim and L. Pellegrini (2022). "Expression of Synj2bp in mouse liver regulates the extent of wrappER-mitochondria contact to maintain hepatic lipid homeostasis." Biol Direct 17(1): 37.

      Pourshafie, N., E. Masati, A. Lopez, E. Bunker, A. Snyder, N. A. Edwards, A. M. Winkelsas, K. H. Fischbeck and C. Grunseich (2022). "Altered SYNJ2BP-mediated mitochondrial-ER contacts in motor neuron disease." Neurobiol Dis: 105832.

      Rios, K. E., M. Zhou, N. M. Lott, C. R. Beauregard, D. P. McDaniel, T. P. Conrads and B. C. Schaefer (2022). "CARD19 Interacts with Mitochondrial Contact Site and Cristae Organizing System Constituent Proteins and Regulates Cristae Morphology." Cells 11(7).

      Scorrano, L., M. A. De Matteis, S. Emr, F. Giordano, G. Hajnoczky, B. Kornmann, L. L. Lackner, T. P. Levine, L. Pellegrini, K. Reinisch, R. Rizzuto, T. Simmen, H. Stenmark, C. Ungermann and M. Schuldiner (2019). "Coming together to define membrane contact sites." Nat Commun 10(1): 1287.

      Woo, J. S., Z. Sun, S. Srikanth and Y. Gwack (2020). "The short isoform of extended synaptotagmin-2 controls Ca(2+) dynamics in T cells via interaction with STIM1." Sci Rep 10(1): 14433.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful to both reviewers for reviewing our manuscript, and for providing very helpful feedback as to how we can improve this work. We have now implemented nearly all of the changes as recommended, and provide responses to these points below.

      In terms of novelty, while recent pre-prints and publications have suggested that the application of multi-omics analysis improves GRN inference, there has yet to be a systematic comparison of linear and non-linear machine learning methods for GRN prediction from single cell multi-omic data. here are many computational and statistical challenges to such a study, and we therefore believe that others in the field will be especially interested in our systematic comparison of network inference methods, especially given the increased interest and utility of multi-omic data.

      In addition, we report the first comprehensive inference of GRNs in early human embryo development. This is a particularly challenging to study developmental context given genetic variation, limitations of sample size due to the precious nature of the material and regulatory constraints. We anticipate that the methodology we developed and datasets we generated will be informative for computational, developmental and stem cell biologists.

      We have uploaded all the network predictions on FigShare and these can be accessed using the following link: https://doi.org/10.6084/m9.figshare.21968813. In addition, we anticipate that the computational and statistical codes and pipelines we developed (available on https://github.com/galanisl/early_hs_embryo_GRNs) will be applied to other cellular and developmental contexts, especially in challenging contexts such as human development, non-typical model organisms and in clinically relevant samples.

      Reviewer 1

      Major comments

      - The proposed strategy (i.e. combining gene expression-based regulatory inference with cis-*regulatory evidence) have been well developed (and implemented) by multiple published works like SCENIC and CellOracle, which is also properly acknowledged by the authors in the discussion section too. This leads to a serious concern on the major methodological contribution of this work. *

      We would like to note that our study is the first to comprehensively evaluate machine learning linear or non-linear gene regulatory network prediction strategies from single-cell transcriptional datasets combined with available multi-omic data. We also apply these methods to a challenging to study context of human early embryogenesis. There are specific methodological challenges arising in this context that other published work has not yet addressed. In particular, the precious nature of the source material means that sample sizes are limited, unlike the contexts where SCENIC and CellOracle were applied. Notably, the numbers of cells available for downstream analysis is typically several orders of magnitude fewer than when scRNA-seq data are collected from adult human tissue or from cell culture. This restriction on sample sizes places corresponding restrictions on statistical power, and is therefore likely to mean that different statistical network inference methodologies are optimal in specific contexts. Furthermore, the inclusion of multi-omic data from complementary platforms (such as ATAC-seq data) becomes even more important in this context to mitigate the effect of reduced sample sizes. These issues are very important for choice of gene regulatory network inference methodology in relation to studies of human embryo development, and ours is the first study to address these issues directly in any context. We have further clarified the novelty of our work in the manuscript in the abstract, introduction and discussion sections.

      - Most of the compared network reconstruction methods involve hyper-parameters setup (e.g., *sparsity regularization weights of the regression methods). The authors did not discuss how these hyper-parameters were chosen. *

      For sparse regression, the hyperparameter controlling sparsity was set by cross-validation (CV), using the internal CV function of the R package. All default settings for GENIE3 were used. This information has now been added to the manuscript (in the Methods section), along with a description of the implementation of the mutual information method we use.

      - For the real-world blastocyst data, the network prediction methods were compared in terms of their reproducibility across validation folds (Fig. 3, Fig. S4-6). However, reproducibility does not necessarily imply accuracy. In fact, statistical learning methods are generally subject to the bias-variance tradeoff, where lower variance (i.e., higher reproducibility) could imply higher bias in model prediction. While there is a lack of gold-standard ground truth to evaluate network accuracy in real biological systems, silver-standards like the ranking of known regulatory interactions in the predictions could be employed as an indirect estimate.

      We thank the reviewer for the opportunity to clarify this point. We would like to avoid any misunderstanding of the reproducibility statistic R, as follows. A higher value of R indicates that the fitted model would generalise well to new data; i.e., R=1 indicates that the model is robust (stable) to perturbations of the data-set. We note that this is not the same as analysing the residual variance of the data after model fitting and related over-fitting (i.e., bias-variance trade-off). The variance that is referred to when discussing bias-variance trade-off is the mean-squared error (of data compared to model), which is not the same as what is assessed by reproducibility statistic R . Specifically, R is a Bayesian estimate of the posterior probability of observing a gene regulation given the data. R is calculated by taking a random sample of the data, doing the network inference again, checking if each gene regulation still appears in the GRN, and then recording (as the R statistic) the average fraction of inclusions over many repetitions. So when we have R close to 1, this indicates that our model predictions generalise well to new data, which is the opposite of what is suggested in this comment. In summary, the accuracy quantified by the reproducibility statistic R relates to the stability of the model predictions to perturbation of the data. We thank the reviewer for the helpful comment to draw our attention to this point, and have now clarified this point in the manuscript on page 6 line 252.

      - The gene set enrichment results were reported only on EPI and TE cell types (Fig. 4C and Fig. *S12), due to the reason that CA data is only available for TE and ICM. However, many of the other results presented in Fig. 3-6 did include the PE cell type albeit using the same CA data. It is not particularly convincing why the cell type inclusion standard for gene set enrichment is different from the other results. *

      We thank the reviewer for noting this and would like to clarify that we restricted the analysis to the EPI and TE, because similar lists of gene-sets were not available for primitive endoderm, where it is currently unclear which pathways are most relevant to this cell type. This has now been clarified in the manuscript on page 8, line 337.

      - The authors cited TF binding in cis-regulatory regions as supporting evidence of several MICA-inferred regulatory interactions (e.g., NANOG -> ZNF343). However, the same cis-regulatory *evidence has already been used in the CA filtering step. All interactions passing CA filtering should in principle have TF-binding support. It would be more convincing if the authors provided other types of evidence as independent support, such as genetic associations like eQTL, experimental perturbations like gene knockdown/knockout, etc. *

      We appreciate the reviewer’s point. We address this by describing published ChIP-seq validation in human pluripotent stem cells which is widely used as a proxy for the study of the epiblast. We feel that the ChIP-seq validation in this context is an appropriate independent validation to support the MICA-inferred cis regulatory interactions predicted from the human embryo datasets we analysed. Our inferences from ATAC-seq data cannot identify TF-DNA binding directly. ChIP-seq data is a widely accepted independent methods to support the inferred interactions from ATAC-seq data.

      We agree that knockdown/knockout would provide further evidence suggesting gene regulation, and indeed these are experiments we would like to conduct systematically in the future, but such perturbations are difficult to achieve at genome-wide scale, especially with very restricted quantities of human embryo material. Notably, these studies would not be evidence of direct regulation and the gold-standard in our opinion is to perturb the cis regulatory region to demonstrate its functional importance in gene regulation. These are important experiments to conduct systematically in the future. We also note that assessing quantitative trait loci in the context of human pre-implantation embryos is extremely challenging due to the restricted sample sizes and genetic variance in the samples collected.

      *- Many of the MICA-inferred regulatory interactions do not exhibit Spearman correlation (Fig. 5, Fig. S17), which could probably be explained by the ability of mutual information to capture complex non-monotonic dependencies. It would be interesting to provide further investigation on these "uncorrelated" edges, which may help demonstrate the superiority of mutual information over Spearman correlation. *

      This has been added as a new Fig.S18.

      - The authors conducted immunostaining experiments to validate the MICA-inferred regulatory *interaction between TFAP2C and JUND. While the identified protein co-localization is a step further than RNA co-expression, it is still correlation rather than causality. Additional evidence like the effect of knockout/knockdown perturbations would be more convincing. *

      We agree with Reviewer 1 that experimental perturbations of TFAP2C and JUND to determine what consequence this has for interactions between these proteins would be informative. However due to the complexity of such an investigation in human embryos, we feel that this is beyond the scope of the current study. One option is to conduct the perturbations in human pluripotent stem cells, however it is unclear if the GRN in this context reflects the same interactions as human embryos and is a distinct question to address in the future. Moreover, while knockdown/knockout studies would be suggestive of up-stream regulation, it will not address the question of whether this is a direct or indirect effect without systematic further analysis including transcription factor-DNA binding (such as CUT&RUN, CUT&Tag or ChIP-seq) analysis as well as perturbations of the putative cis regulatory regions. These are all exciting future experiments and our study provides us and others with hypotheses to functionally test in the future. These are future directions and we have clarified this in the discussion section on page 16, line 576.

      __Minor comments __

      • *The γ symbols in AP-2γ are not correctly rendered. *

      We note that this applies only to the way AP-2γ appears on the Review Commons website, and we are trying to fix this issue. We hope this transformation after the manuscript upload will not apply to a subsequent transfer to a journal.

      • The UMAP figures (Fig. 4A, Fig. S7) are of low resolution compared to other figures.

      We thank the reviewer for noting this. These figures have now been added as vector graphics files to overcome this issue.

      • As the authors are focused on studying the blastocyst regulatory network, the inferred regulatory interactions should be provided as supplementary data.

      We have included all of the inferred gene regulatory interactions as a supplementary folder for the MICA predictions using FigShare: doi.org/10.6084/m9.figshare.21968813. We have included code to reproduce the inferred gene regulatory interactions for the other methods which we compared to MICA. Because this includes 100,000 regulatory interactions per method, we feel that it would be impractical to include the alternative inferred interaction as supplementary data.

      Reviewer 2

      Minor comments

      *- In the abstract, it would be adequate to already mention which normalisation method works the best. *

      This has now been added to the abstract and we appreciate this suggestion.

      *- In Fig. 1: *

      * Describe what are squares and circles

      This information has been included in the figure 1 legend.

      ** In the GRNs refined by keeping CA-predicted regulations only, mention that this are Cis interactions *

      We have modified the figure 1 legend and the text on page 5, line 224 to clarify that these are putative cis-regulatory interactions.

      * The ATAC seq shows KRT8, GATA3, RELB motifs, while the rest of the figure is very general. Maybe make the ATAC-seq peaks panel also as a sketch and relate it to the square/circles graphs on the right hand side to showcase how the filtering of the network is performed.

      We appreciate this suggestion and modified figure 1 accordingly.

      ** The caption says Five GRN inference approaches, while abstract and text say 4. If is clear after reading that the 5th is a random approach. However, it was a surprise at first. *

      We have modified the figure 1 legend to clarify that we also compared random prediction in addition to the 4 GRN inference approaches.

      *- How the Simulation study was performed is not understandable for non experts as it is described in the Methods section. This is an important approach in general, and I think the audience would benefit if the authors add a full section about it in their supplementary data. *

      Further details have now been added to the subsection ‘simulation study’ in the Methods section.

      *- Fig. 2: *

      ** As it is, it is hard to tell the difference between GRN inference methods for a given sample size and number of regulators. Could the authors add a comparative panel for this (maybe some scatter plots would be enough)? MI by itself looks worse here? *

      We thank the reviewer for this helpful suggestion. This comparative plot has now been included in figure 2 and indicates that MI is on par with the other GRN inference methods using simulation RNA-seq data.

      *- When mentioning "samples" (e.g. last paragraph of section 1 in results), do the authors refer to "cells"? *

      We appreciate the reviewer pointing this out and have amended the text throughout to state that these are cells.

      *- What about normalisation effects in the simulated data? *

      With regards to the simulated data, normalisation effects are not relevant as we are generating data that are idealised and therefore not subject to unwanted sources of variation such as read depth. However, in future work, this could be investigated with an expanded simulation study and we appreciate the reviewer’s suggestion.

      *- Figure S7 should be cited in the first paragraph of section 2 in results. *

      This has now been cited.

      *Could the authors add a panel to indicate whether the data is SMART-seq2 or 10X. *

      We thank the reviewer for the suggestion to clarify this, which we think is an important point. We have included a statement that all data used was generated using the SMART-seq2 sequencing technique in the figure legend. The choice of sequencing method/depth of sequencing will likely impact on the choice of GRN inference method and we have also clarified this in the discussion section on page 13, line 516.

      *- In the association of inferred GRNs to human blastocyst cell lineages, the authors find the GRN edges predicted that overlap between the 4 inference methods in each cell type. Do they, therefore, recommend to always use more than one GRN inference method? *

      Identifying overlapping inferences by comparing more than one GRN inference method may be a strategy to identify network edges with more confidence due to the agreement between several inference methodologies. However, this strategy may also miss some edges which can only be detected by one method and not another. We have included a statement in the discussion section to clarify this point on page 15, line 571.

      - If the CA data used was only generated for the TE and ICM only, how do the authors use it to perform MICA on PE?

      We appreciate that this is confusing and have since revised the manuscript on page 5, line 223 to state that the inner cell mass (ICM), comprises EPI (epiblast) and PE (primitive endoderm) cells. It may be that we miss putative cis-regulatory interactions if the ICM CA data does not reflect developmentally progressed PE and EPI cells and we have noted this caveat in the discussion section on page 15, line 561.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      She et al studied the evolution of gene expression reaction norms when individuals colonise a new environment that exposes them to physiologically challenging conditions. Their objective was to test the "plasticity first" hypothesis, which suggest that traits that are already plastic (their value changes when facing a new environment compared to the original environment) facilitates the colonisation of novel environments, which, if true, would be predicted to result in the evolution of gene expression values that are similar in the population that colonised the new environment and evolved under these particular selection pressures. To test this prediction, they studied gene expression in cardiac and muscle tissues in individuals originating from three conditions: lowland individuals in their natural environment (ancestral state), lowland individuals exposed to hypoxia (the plastic response state), and a highland population facing hypoxia for several generations (the coloniser state). They classified gene expression patterns as maladaptive or adaptive in lowland individuals responding to short term hypoxia by classifying gene expression patterns using genes that differed between the ancestral state (lowland) and colonised state (highland). Genes expressed in the same direction in lowland individuals facing hypoxia (the plastic state) as what is found in the colonised state are defined as adaptative, while genes with the opposite expression pattern were labelled as maladaptive, using the assumption that the colonised state must represent the result of natural selection. Furthermore, genes could be classified as representing reversion plasticity when the expression pattern differed between the plasticity and colonised states and as reinforcement when they were in the same direction (for example more expressed in the plastic state and the colonised state than in the ancestral state). They found that more genes had a plastic expression pattern that was labelled as maladaptive than adaptive. Therefore, some of the genes have an expression pattern in accordance with what would be predicted based on the plasticity-first hypothesis, while others do not.

      Thank you for a precise summary of our work. We appreciate the very encouraging comments recognizing the value of our work. We have addressed concerns from the reviewer in greater detail below.

      Q1. As pointed out by the authors themselves, the fact that temperature was not included as a variable, which would make the experimental design much more complex, misses the opportunity to more accurately reflect the environmental conditions that the colonizer individuals face at high altitude. Also pointed out by the authors, the acclimation experiment in hypoxia lasted 4 weeks. It is possible that longer term effects would be identifiable in gene expression in the lowland individuals facing hypoxia on a longer time scale. Furthermore, a sample size of 3 or 4 individuals per group depending on the tissue for wild individuals may miss some of the natural variation present in these populations. Stating that they have a n=7 for the plastic stage and n= 14 for the ancestral and colonized stages refers to the total number of tissue samples and not the number of individuals, according to supplementary table 1.

      We shared the same concerns as the reviewer. This is partly because it is quite challenging to bring wild birds into captivity to conduct the hypoxia acclimation experiments. We had to work hard to perform acclimation experiments by taking lowland sparrows in a hypoxic condition for a month. We indeed have recognized the similar set of limitations as the review pointed out and have discussed the limitations in the study, i.e., considering hypoxic condition alone, short time acclimation period, etc. Regarding sample sizes, we have collected cardiac muscle from nine individuals (three individuals for each stage) and flight muscle from 12 individuals (four individuals for each stage). We have clarified this in Supplementary Table 1.

      Q2. Finally, I could not find a statement indicating that the lowland individuals placed in hypoxia (plastic stage) were from the same population as the lowland individuals for which transcriptomic data was already available, used as the "ancestral state" group (which themselves seem to come from 3 populations Qinghuangdao, Beijing, and Tianjin, according to supplementary table 2) nor if they were sampled in the same time of year (pre reproduction, during breeding, after, or if they were juveniles, proportion of males or females, etc). These two aspects could affect both gene expression (through neutral or adaptive genetic variation among lowland populations that can affect gene expression, or environmental effects other than hypoxia that differ in these populations' environments or because of their sexes or age). This could potentially also affect the FST analysis done by the authors, which they use to claim that strong selective pressure acted on the expression level of some of the genes in the colonised group.

      The reviewer asked how individual tree sparrows used in the transcriptomic analyses were collected. The individuals used for the hypoxia acclimation experiment and represented the ancestral lowland population were collected from the same locality (Beijing) and at the same season (i.e., pre-breeding) of the year. They are all adults and weight approximately 18g. We have clarified this in the Supplementary Table S1 and Methods. We did not distinguish males from females (both sexes look similar) under the assumption that both sexes respond similarly to hypoxia acclimation in their cardiac and flight muscle gene expression.

      The Supplementary Table 2 lists the individuals that were used for sequence analyses. These individuals were only used for sequence comparisons but not for the transcriptomic analyses. The population genetic structure analyzed in a previously published study showed that there is no clear genetic divergence within the lowland population (i.e., individuals collected from Beijing, Tianjing and Qinhuangdao) or the highland population (i.e., Gangcha and Qinghai Lake). In addition, there was no clear genetic divergence between the highland and lowland populations (Qu et al. 2020).

      Q4. Impact of the work

      There has been work showing that populations adapted to high altitude environments show changes in their hypoxia response that differs from the short-term acclimation response of lowland population of the same species. For example, in humans, see Erzurum et al. 2007 and Peng et al. 2017, where they show that the hypoxia response cascade, which starts with the gene HIF (Hypoxia-Inducible Factor) and includes the EPO gene, which codes for erythropoietin, which in turns activates the production of red blood cell, is LESS activated in high altitude individuals compared to the activation level in lowland individuals (which gives it its name). The present work adds to this body of knowledge showing that the short-term response to hypoxia and the long term one can affect different pathways and that acclimation/plasticity does not always predict what physiological traits will evolve in populations that colonize these environments over many generations and additional selection pressure (UV exposure, temperature, nutrient availability). Altogether, this work provides new information on the evolution of reaction norms of genes associated with the physiological response to one of the main environmental variables that affects almost all animals, oxygen availability. It also provides an interesting model system to study this type of question further in a natural population of homeotherms.

      Erzurum, S. C., S. Ghosh, A. J. Janocha, W. Xu, S. Bauer, N. S. Bryan, J. Tejero et al. "Higher blood flow and circulating NO products offset high-altitude hypoxia among Tibetans." Proceedings of the National Academy of Sciences 104, no. 45 (2007): 17593-17598.

      Peng, Y., C. Cui, Y. He, Ouzhuluobu, H. Zhang, D. Yang, Q. Zhang, Bianbazhuoma, L. Yang, Y. He, et al. 2017. Down-regulation of EPAS1 transcription and genetic adaptation of Tibetans to high-altitude hypoxia. Molecular biology and evolution 34:818-830.

      Thank you for highlighting the potential novelty of our work in light of the big field. We found it very interesting to discuss our results (from a bird species) together with similar findings from humans. In the revised version of manuscript, we have discussed short-term acclimation response and long-term adaptive evolution to a high-elevation environment, as well as how our work provides understanding of the relative roles of short-term plasticity and long-term adaptation. We appreciate the two important work pointed out by the reviewer and we have also cited them in the revised version of manuscript.

      Reviewer #2 (Public Review):

      This is a well-written paper using gene expression in tree sparrow as model traits to distinguish between genetic effects that either reinforce or reverse initial plastic response to environmental changes. Tree sparrow tissues (cardiac and flight muscle) collected in lowland populations subject to hypoxia treatment were profiled for gene expression and compared with previously collected data in 1) highland birds; 2) lowland birds under normal condition to test for differences in directions of changes between initial plastic response and subsequent colonized response. The question is an important and interesting one but I have several major concerns on experimental design and interpretations.

      Thank you for a precise summary of our work and constructive comments to improve this study. We have addressed your concerns in greater detail below.

      Q1. The datasets consist of two sources of data. The hypoxia treated birds collected from the current study and highland and lowland birds in their respective native environment from a previous study. This creates a complete confounding between the hypoxia treatment and experimental batches that it is impossible to draw any conclusions. The sample size is relatively small. Basically correlation among tens of thousands of genes was computed based on merely 12 or 9 samples.

      We appreciate the critical comments from the reviewer. The reviewer raised the concerns about the batch effect from birds collected from the previous study and this study. There is an important detail we didn’t describe in the previous version. All tissues from hypoxia acclimated birds and highland and lowland birds have been collected at the same time (i.e., Qu et al. 2020). RNA library construction and sequencing of these samples were also conducted at the same time, although only the transcriptomic data of lowland and highland tree sparrows were included in Qu et al. (2020). The data from acclimated birds have not been published before.

      In the revised version of manuscript, we also compared log-transformed transcript per million (TPM) across all genes and determined the most conserved genes (i.e., coefficient of variance ≤  0.3 and average TPM ≥ 1 for each sample) for the flight and cardiac muscles, respectively (Hao et al. 2023). We compared the median expression levels of these conserved genes and found no difference among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05). As these results suggested little batch effect on the transcriptomic data, we used TPM values to calculate gene expression level and intensity. This methodological detail has been further clarified in the Methods and we also provided a new supplementary Figure (Figure S5) to show the comparative results.

      The reviewer also raised the issue of sample size. We certainly would have liked to have more individuals in the study, but this was not possible due to the logistical problem of keeping wild bird in a common garden experiment for a long time. We have acknowledged this in the manuscript. In order to mitigate this we have tested the hypothesis of plasticity following by genetic change using two different tissues (cardiac and flight muscles) and two different datasets (co-expressed gene-set and muscle-associated gene-set). As all these analyses show similar results, they indicate that the main conclusion drawn from this study is robust.

      Q2. Genes are classified into two classes (reversion and reinforcement) based on arbitrarily chosen thresholds. More "reversion" genes are found and this was taken as evidence reversal is more prominent. However, a trivial explanation is that genes must be expressed within a certain range and those plastic changes simply have more space to reverse direction rather than having any biological reason to do so.

      Thank you for the critical comments. There are two questions raised we should like to address them separately. The first concern centered on the issue of arbitrarily chosen thresholds. In our manuscript, we used a range of thresholds, i.e., 50%, 100%, 150% and 200% of change in the gene expression levels of the ancestral lowland tree sparrow to detect genes with reinforcement and reversion plasticity. By this design we wanted to explore the magnitudes of gene expression plasticity (i.e., Ho & Zhang 2018), and whether strength of selection (i.e., genetic variation) changes with the magnitude of gene expression plasticity (i.e., Campbell-Staton et al. 2021).

      As the reviewer pointed out, we have now realized that this threshold selection is arbitrarily. We have thus implemented two other categorization schemes to test the robustness of the observation of unequal proportions of genes with reinforcement and reversion plasticity. Specifically, we used a parametric bootstrap procedure as described in Ho & Zhang (2019), which aimed to identify genes resulting from genuine differences rather than random sampling errors. Bootstrap results suggested that genes exhibiting reversing plasticity significantly outnumber those exhibiting reversing plasticity, suggesting that our inference of an excess of genes with reversion plasticity is robust to random sampling errors. We have added these analyses to the revised version of manuscript, and provided results in the Figure 2d and Figure 3d.

      In addition, we adapted a bin scheme (i.e., 20%, 40% and 60% bin settings along the spectrum of the reinforcement/reversion plasticity). These analyses based on different categorization schemes revealed similar results, and suggested that our inference of an excess of genes with reversion plasticity is robust. We have provided these results in the Supplementary Figure S2 and S4.

      The second issue that the reviewer raised is that the plastic changes simply have more space to reverse direction rather than having any biological reason to do so. While a causal reason why there are more genes with expression levels being reversed than those with expression levels being reinforced at the late stages is still contentious, increasingly many studies show that genes expression plasticity at the early stage may be functionally maladapted to novel environment that the species have recently colonized (i.e., lizard, Campbell-Staton et al. 2021; Escherichia coli, yeast, guppies, chickens and babblers, Ho and Zhang 2018; Ho et al. 2020; Kuo et al. 2023). Our comparisons based on the two genesets that are associated with muscle phenotypes corroborated with these previous studies and showed that initial gene expression plasticity may be nonadaptive to the novel environments (i.e., Ghalambor et al. 2015; Ho & Zhang 2018; Ho et al. 2020; Kuo et al. 2023; Campbell-Staton et al. 2021).

      Q3. The correlation between plastic change and evolved divergence is an artifact due to the definitions of adaptive versus maladaptive changes. For example, the definition of adaptive changes requires that plastic change and evolved divergence are in the same direction (Figure 3a), so the positive correlation was a result of this selection (Figure 3d).

      The reviewer raised an issue that the correlation between plastic change and evolved divergence is an artifact because of the definition of adaptive versus maladaptive changes, for example, Figure 3d. We agree with the reviewer that the correlation analysis is circular because the definition of adaptive and maladaptive plasticity depends on the direction of plastic change matched or opposed that of the colonized tree sparrows. We have thus removed previous Figure 3d-e and related texts from the revised version of manuscript. Meanwhile, we have changed Figure 3a to further clarify the schematic framework.

      Reviewer #1 (Recommendations For The Authors):

      Q1. Here are private recommendations that I think could help improve the manuscript. West-Eberhard was a pioneer back in 2003 in explicating the hypothesis of "plasticity first". I think it is important to cite their main work in the first paragraph of introduction and to use the term "plasticity-first", which is widely known among evolutionary biologists studying phenotypic plasticity, instead of "plasticity followed by genetic change", since the three papers cited in paragraph 1 call it « plasticity first ».

      West-Eberhard, M.J. (2003) Developmental Plasticity and Evolution, Oxford University Press.

      Thank you for suggesting West-Eberhard (2003) and we have cited this important work. We have also changed “plasticity followed by genetic change” to “plasticity first”.

      Q2. Introduction. Line 5, Change for « On the one hand, if plasticity changes ... »

      We have modified as suggested.

      Q3. Line 52, Change for « ...same direction as adaptive evolution does ...»

      We have modified as suggested.

      Q4. Line 66,When presenting papers that address the plasticity and evolution of gene expression in response to environmental variables, paper by Morris et al is another example that could be useful to include (but this is only a suggestion in case the authors missed it).

      Thank you for suggesting this nice work. We have cited Morris et al. (2014).

      Q5. Line 94, Change for "We acclimated"

      We have modified as suggested.

      Q6. In Figure 3, the figure in panel A and B is labelled "normaxia", but I think that "normoxia" is usually the term used.

      Thank you for spot the typo. We have modified Figure 3a and we no longer used the term “normaxia”.

      Material and methods

      It would be important to merge supplementary table 1 and 2 and only present the individuals that were used with their respective cardiac and muscle libraries (if they come from the same individual?). Also, the origin of the individuals used in the hypoxia experiment should be explained at the beginning of the methods section and explicated in the supplementary table. Information on sex or stage of development (juvenile? Adult? Male? female?) and time of year (in breeding stage? Pre-migration (if any), etc) would allow the reader to see that individuals from lowland differed only in their exposure to hypoxia or not, or if other variables may affect gene expression patterns. Similarly, if all individuals form the highland are males and the lowland hypoxia exposed individuals are females (or juveniles versus breeders, or different time of year, etc) this should be stated in the methods. Gene expression is labile so the reader should know if other variables influence the results presented or not.

      Thank you for suggestion. We have added detailed information (i.e., age, collecting time and season) to the supplementary Table 1. We have also added this information to the Methods. Because the birds used in transcriptomic analysis (Supplementary Table 1) were different individuals from those used in the sequence analyses (Supplementary Table 2), these two tables cannot be merged.

      References:

      Campbell-Staton SC, Velotta JP, Winchell KM. 2021. Selection on adaptive and maladaptive genes expression plasticity during thermal adaptation to urban heat islands. Nat. Commun. 12: 6195.

      Ghalambor CK, Hoke KL, Ruell EW, Fischer EK, Reznick DN, Hughes KA. 2015. Non-adaptive plasticity potentiates rapid adaptive evolution of gene expression in nature. Nature 525:372–375.

      Hao et al. 2023. Divergent contributions of coding and noncoding sequences to initial high-altitude adaptation in passerine birds endemic to the Qinghai–Tibet Plateau. Mol. Ecol. Doi: 10.1111/mec.16942.

      Ho WC, Zhang J. 2018. Evolutionary adaptations to new environments generally reverse plastic phenotypic changes. Nat. Commun. 9: 350.

      Ho WC, Zhang J. 2019. Genetic gene expression changes during environmental adaptations tend to reverse plastic changes even after correction for statistical nonindependence. Mol. Biol. Evol. 36: 604–612.

      Ho WC, Li D, Zhu Q, Zhang J. 2020. Phenotypic plasticity as a long-term memory easing readaptations to ancestral environments. Sci. Adv. 6: eaba3388.

      Kuo KC, Yao CT, Liao BY, Weng MP, Dong F, Hsu YC, Hung CM. 2023. Weak gene-gene interaction facilitates the evolution of gene expression plasticity. BMC Biol. 21: 57.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. Point-by-point description of the revisions

      Reviewer #1

      Evidence, reproducibility and clarity (Required):

      In this paper by Wideman et al, the authors seek to determine the role of cellular iron homeostasis in the pathogenesis of murine malaria.

      The authors to attempt to disentangle the effects of anemia from that of cellular iron deficiency. The authors elegantly make use of a murine model of a rare human mutation in the transferrin receptor. This mutation leads to decreased receptor internalization and decreased cellular iron, but otherwise healthy mice. Using this model, the authors use a P. chabaudi infection model and show an increase in pathogen burden and a decrease in pathology. They show in some detail that the immune response to P. chabaudi infection is blunted, both T and B-cell responses are attenuated in the TfRY20H/Y20H model, and the block in proliferation can be rescued by exogenous iron supplementation. They also show that decreased cellular iron attenuates liver pathology through potentially multiple mechanisms.

      Minor comments:

      • The peak of parasitemia is relatively low (approx..3%) compared to other published studies (e.g. PMID: 22100995, 16714546, 31110285) where the peak in C57BL/6 mice reached 25 - 40%. Can the authors account for this low parasitemia?

      Response: We thank the reviewer for their constructive comments and appreciate that they are highlighting this important point. It has previously been shown (PMID: 23217144, 23719378) that mosquito-transmission of P. chabaudi leads to significantly lower parasitaemia (“Recently mosquito-transmitted parasites were used to mimic a natural infection more closely, as vector transmission is known to regulate Plasmodium virulence and alter the host’s immune response (48-50). Consequently, parasitaemia is expected to be significantly lower upon infection with recently mosquito-transmitted parasites, compared to infection with serially blood-passaged parasites that are more virulent (48,49).”

      • Figure 1K - At homeostasis, serum iron is low in TfR mice however increases to significantly higher than the WT mice at 8 days post infection. Do the authors have an explanation on why these dramatic changes in serum iron are seen?

      Response: During malaria infection, RBC lysis releases haem and iron into circulation, which leads to an increase in serum iron levels. This effect is observed in both wild-type and TfrcY20H/Y20H mice infected with P. chabaudi (Supplementary Figure 1F & Figure 1K). However, the significantly higher serum iron levels observed in infected TfrcY20H/Y20H mice can likely be explained by their decreased capacity for transferrin receptor-1 mediated iron uptake, leading to relatively slower uptake and storage of circulating transferrin-bound iron into tissues. This has been clarified in the manuscript (line 142-143):

      “The elevated serum iron observed in infected TfrcY20H/Y20H mice was consistent with their restricted capacity to take up transferrin-bound circulating iron into tissues.”

      • Figure S3 - Is it surprising that no effects on splenic neutrophils are seen? Were neutrophils quantified at any other point? These would also be expected to have a role in both the control of malaria infection and on any pathology.

      Response: We thank the reviewer for raising this interesting question. It is known that neutrophils can be sensitive to cellular iron deficiency (PMID: 36197985) and that neutrophils can play an important part in malaria infection (PMID: 31628160). However, the magnitude and significance of the neutrophil response to recently mosquito-transmitted P. chabaudi parasites has not been thoroughly investigated. A recent study demonstrated that monocytes and macrophages may be more important than granulocytes in the early response to recently mosquito-transmitted P. chabaudi infection (PMID: 34532703).

      Moreover, we performed neutrophil quantifications in our initial experiments and found that the splenic neutrophil response was not altered in TfrcY20H/Y20H mice eight days after infection. Additionally, no neutrophil infiltration was observed in the liver of either genotype upon P. chabaudi infection. In light of these findings, we did not characterise the neutrophil response further, as it appeared unlikely that neutrophils were the principal causal agent of either the altered immunity or pathology, in this context. However, we agree with the reviewer that larger question of whether neutrophil iron plays a role in the pathology of malaria is an interesting open question which we hope future studies can elucidate.

      A section was added to the discussion to address the role of innate immune cells in our model (line 354-363):

      “The inhibited innate immune response to P. chabaudi in TfrcY20H/Y20H mice likely contributed to both the increased pathogen burden and the decreased liver pathology. Splenic MNPs are important for controlling parasitaemia (34,35,72), but MNPs are also vital for maintaining tissue homeostasis and preventing tissue damage in malaria (43,73). Although other innate cells, such as neutrophils, NK cells and γδT cells are an important part of the immune response to malaria, only the MNP response was distinctly impaired in TfrcY20H/Y20H mice. Notably, neutrophils are known to be sensitive to iron deficiency (16,74) and to affect both immunity and pathology in malaria (75,76). However, in the context of recently mosquito-transmitted P. chabaudi it appears that monocytes and macrophages, rather than granulocytes, may be particularly important for parasite control and tissue homeostasis (43,72).”

      Changes to the text:

      • Fig S1EandF - Please add to the figure legend that these were measured at homeostasis.

      Response: This clarification has been added to the legend of Supplementary Figure 1 (line 954-957).

      • Figure 3 - In the legend, H and I are the wrong way around.

      Response: The legend of Figure 3 has been corrected accordingly (line 888-890).

      • Figure 4 - please add the units of concentration of FeSO4 to all panels

      Response: The units of concentration for FeSO4 and AFeC have been added to all panels of Figure 4 and 6, respectively.

      • Line 246 - The authors state: "there was some evidence of decreased malaria-induced hepatomegaly" however there is no significant difference between WT and TfR mice and both show significant hepatomegaly. I feel that this line should be reworded.

      Response: The sentence (line 252-254) has been reworded as follows:

      Furthermore, while both genotypes developed malaria-induced hepatomegaly, there was a trend toward less severe hepatomegaly in TfrcY20H/Y20H mice (Figure S5C).”

      Significance (Required):

      This work is one of the first to attempt to define the requirements for cellular iron in malaria infection. This is a difficult topic, as infection and associated inflammation and the red blood cell destruction caused by malaria all have complex effects on iron within the body. This study fits well with previous observations showing that anemia can be protective as it both prevents parasite growth and limit immunopathology. This work advances the field by demonstrating a cell intrinsic role for iron in malaria infection. There is a broad possible audience for this work, including malaria researchers, immunologists and people interested in the role or iron, both at a cellular level and systemically.

      Reviewer #2

      Evidence, reproducibility and clarity (Required):

      In this manuscript, the authors have studied the role of iron deficiency in the host response to Plasmodium infection using a transgenic mouse model that carries a mutation in the transferrin receptor. They show that restricted cellular iron acquisition attenuated P. chabaudi infection- induced splenic and hepatic immune responses which in turn mitigated the immunopathology, even though the peak parasitemia was significantly high in the mutant mice. Interestingly, the course of parasite infection doesn't seem to be affected in the mutant mice compared to the wildtype mice despite the induction of poor immune responses. The authors show that the decreased cellular iron uptake broadly impact both innate and adaptive components of the immune system. Conversely, free iron supplementation restored the immune cell functions.

      • The study is well performed, and the manuscript is well written. However, the authors should show how conserved the role of cellular iron is across other rodent malaria parasite species at least with * yoelii or P. berghei* blood stage infection models. This question becomes critical to address in order to understand broad relevance to human malaria infections where both the host and parasites are genetically diverse.

      Response: We thank the reviewer for appreciating our study and for the thoughtful comments. We agree with the reviewer that the diverse genetic background of both parasites and hosts makes it difficult to draw broad conclusions about human malaria infection from animal studies performed in a laboratory setting. The recently mosquito-transmitted P. chabaudi chabaudi AS blood-stage infection model replicates many key features of mild to moderate malaria infection in humans, such as low parasitaemia, anaemia, cyto-adhesive sequestration in microvasculature, and self-resolving immunopathology. Importantly, the immune response elicited by recently mosquito-transmitted parasites also more closely mimics the immune response to a natural infection (PMID: 23719378). Therefore, we consider the recently mosquito-transmitted P. chabaudi chabaudi AS model as the most relevant to answer our particular research questions.

      Furthermore, specific pathogen-free parasitised erythrocyte stabilates made from recently mosquito-transmitted P. berghei or P. yoelii parasites are unfortunately not readily accessible (e.g. through the European Malaria Reagent Repository), in contrast to P. chabaudi. Consequently, preparing and characterising recently mosquito-transmitted strains to perform the experiments suggested by the reviewer would require a substantial amount of additional time and labour, which we deem out of scope for this study.

      In the design of our model we have also taken care to minimise the effects of anaemia, something which would be difficult or impossible to achieve using serially blood passaged P. yollii or P. berghei parasites. Both P. yoelii and P. berghei merozoites preferentially invade immature RBCs (PMID: 34322397) making readouts such as parasitaemia far more sensitive to small variations in erythropoietic output. In addition, the extensive RBC destruction caused by most serially blood-passaged murine Plasmodium strains would likely exaggerate any erythropoietic impairment caused by the TfrcY20H/Y20H mutation.

      Although we strongly believe that the chosen mouse model of malaria is the most appropriate for our study, ultimately, no mouse model can replicate all features of human malaria infection. Inevitably, the direct relevance of animal studies for human infection will always be somewhat opaque. Hence, we respectfully disagree with the reviewer that repeating the experiments with additional murine malaria parasite species would allow us to extrapolate conclusions about human malaria infection. Such experiments would also conflict with the 3Rs principles that govern work with animals in the UK (https://nc3rs.org.uk/). Especially, because most strains of P. yoelii and P. berghei cause severe or non-resolving infections and have a significant negative impact on animal welfare.

      In our opinion, the logical continuation of this study must be to utilise the insights from our research to inform future human studies on the relationships between iron deficiency and malaria-related immunopathology. However, we agree that this is an important topic and have added a section addressing the broad relevance of our findings to the discussion (line 393-396):

      “It remains to be seen what the broader importance of cellular iron is in human malaria infection, in particular within the diverse genetic context of both humans and parasites found in malaria endemic regions. Murine models of malaria are useful in providing hypothesis-generating results, but such findings ultimately ought to be confirmed and developed further through studies in human populations.”

      • Since, restricted cellular iron uptake mitigates the immunopathology, the authors should explore whether this could also relieve the cerebral malaria condition that is caused by the hyper inflammation in the brain. They should use the * berghei* ANKA parasite strain which causes t cerebral malaria in mice. I think would increase impact of the paper.

      Response: Although we agree that this would be an interesting line of inquiry, we think that it is outside of the scope of this study, which predominantly aims to characterise and study the effects of cellular iron deficiency in host cells, particularly immune cells, during mild to moderate malaria infection. The severe pathology underlying cerebral malaria differs greatly from that of a self-resolving blood-stage infection. Furthermore, the relevance to human cerebral malaria of the P. berghei ANKA model is controversial within the field (PMID: 21288352) and as a severe infection its use would again conflict with the 3Rs principles.

      Minor comments:

      • Line 222: repeating word, "iron iron-supplemented...."

      Response: The sentence has been corrected (line 228).

      • Figure 3C, S4C & S5F: Why Mann-Whitney test is performed in these particular graphs, whereas rest of the two groups comparison were done using Welch's test? The authors should clearly mention this in the methods section.

      Response: We apologise if this was unclear in the manuscript. We routinely tested all our datasets for normality to identify the appropriate tests for each dataset. In case of the graphs shown in figure 3C, S4C and S5F, the dataset did not pass the D’Agostino-Pearson normality test and we therefore applied a non-parametric test (i.e. Mann-Whitney), in contrast to the other datasets that passed the test for normal or lognormal distribution. This has been further clarified in the method section (line 581-586):

      The D’Agostino-Pearson omnibus normality test was used to determine normality/lognormality. Parametric statistical tests (e.g. Welch’s t-test) were used for normally distributed data. For lognormal distributions, the data was log-transformed prior to statistical analysis. Where data did not have a normal or lognormal distribution, or too few data points were available for normality testing, a nonparametric test (e.g. Mann-Whitney test) was applied.“

      • Have authors explored whether gamma-delta T cell responses are affected in the mutant mouse strain compared to wildtype mice as they are one of the early responders and the key cytokine producing cells against the Plasmodium blood stage infection.

      Response: __We thank the reviewer for this valuable comment. We briefly explored the role of γδT cells, but did not observe a significant difference in splenic γδT cell numbers between wild-type and TfrcY20H/Y20H mice, eight days post-infection (__Reviewer Figure 1). It is of course possible that γδT cell numbers were affected at an earlier stage, or that γδT cell function (e.g. cytokine production) was affected by cellular iron deficiency during P. chabaudi infection. However, γδT cells may also be less sensitive to cellular iron deficiency than conventional T cells, as has been previously demonstrated for developing T cells (PMID: 7957580).

      A section was added to the discussion to address the role of innate immune cells in our model (line 354-363):

      “The inhibited innate immune response to P. chabaudi in TfrcY20H/Y20H mice likely contributed to both the increased pathogen burden and the decreased liver pathology. Splenic MNPs are important for controlling parasitaemia (34,35,72), but MNPs are also vital for maintaining tissue homeostasis and preventing tissue damage in malaria (43,73). Although other innate cells, such as neutrophils, NK cells and γδT cells are an important part of the immune response to malaria, only the MNP response was distinctly impaired in TfrcY20H/Y20H mice. Notably, neutrophils are known to be sensitive to iron deficiency (16,74) and to affect both immunity and pathology in malaria (75,76). However, in the context of recently mosquito-transmitted P. chabaudi it appears that monocytes and macrophages, rather than granulocytes, may be particularly important for parasite control and tissue homeostasis (43,72).”

      Significance (Required):

      Overall, the study provides novel insights into the role of iron in the immune response to Plasmodium blood stage infection using a rodent malaria model and the interplay of infection, immunity and the development of pathology. As such it is an important study.

      Reviewer #3

      Evidence, reproducibility and clarity (Required):

      Herein Wideman provide novel and important evidence on the role of iron availability for mounting an efficient immune response in a malaria infection model. They employed TfRC Y201H/Y201H mice which develop iron deficiency due to impaired cellular ingestion of transferrin bound iron. They found that those mice develop higher peak parasitemia after vector borne exposure to Pl. chabaudi chabaudi which was paralleled by an impaired immune response as reflected by altered CD4 cell activation, reduced IFN-g formation or reduced B-cell responsiveness. Those deficiencies could be re-covered upon ex vivo iron supplementation pointing to the importance of iron availability for mounting-CD4+ and B-cell specific anti-plasmodial immune responses at the initial phase of infection. However, TFRC mutated mice were able to clear infection over time in a comparable fashion to wt mice.

      This excellent study is important in convincingly showing (by employing high quality immunological analyses) the importance of cellular iron deficiency on immune responses in an infection model of general interest. It also indicates that overwhelming immune response as seen in wt mice is associated with organ damage over time.

      Minor comments:

      • The authors should discuss why and how TFRC mutated mice were able to control infection over time in a comparable fashion as wt mice although peak parasitemia was significantly higher?

      __Response: __We thank the reviewer for the helpful feedback on our study and for posing this interesting question. It does indeed appear as if the immune response, while significantly inhibited in the TfrcY20H/Y20H mice, is still sufficient to clear the infection. It is plausible that the early cell-mediated immune response is inhibited to the degree that parasite control is impaired, resulting in higher peak parasitaemia in TfrcY20H/Y20H mice. In contrast, parasite clearance is comparable and contemporary in both genotypes. Based on the fact that parasite clearance occurs at a time when a substantial adaptive immune response is expected to emerge, we hypothesize that this significantly contributes to pathogen clearance. Thus, it seems likely that the humoral response in TfrcY20H/Y20H mice, even if inhibited, may still be effective enough to clear the parasites and prevent recrudescence.

      As malaria infection progresses, RBC loss and increasing anaemia also contributes to limiting exponential parasite growth. This occurs more or less equally in both genotypes, but it could be particularly important for parasite control in the TfrcY20H/Y20H mice that have an inhibited immune response.

      We have added a section to the discussion to address this (line 380-386):

      “Despite the higher peak parasitaemia in TfrcY20H/Y20H mice, both genotypes were able to clear P. chabaudi parasites at a comparable rate and prevent recrudescence. It follows that even a weakened humoral immune response appears to be sufficient to control P. chabaudi infection. However, our study did not investigate the effects of immune cell iron deficiency on the formation of long-term immunity, which may have been more severely affected. The impaired GC response, in particular, suggests that iron deficiency could counteract the formation of efficient immune memory to subsequent malaria infections.”

      • The authors and others have previously shown (Frost J et al. Sci Adv 2022, Hoffmann et al. EBioMedicine 2021) that iron deficiency results in reduced neutrophil numbers in different infection models. This could also have contributed to the observed effect in initial infection control but may have also been linked altered histopathology seen in Figure 7. However, no mention of neutrophil numbers in this model is made. It would be important if the authors could provide information on neutrophil numbers (only if this analysis has been already performed) and discuss this issue in association with their observation.

      Response: We appreciate that the reviewer has brought attention to this important topic. As they mention, iron deficiency can have a negative impact on the neutrophil response (PMID: 36197985, 34488018) but it can also cause a maladaptive excessive neutrophil response due to failed adaptive immunity (PMID: 33665641). In this study, we show that there is no difference in splenic neutrophil numbers between wild-type and TfrcY20H/Y20H mice, eight days after P. chabaudi infection (Figure S3B). Moreover, the histopathologists detected no liver neutrophil infiltration in either genotype, but rather observed infiltration of mononuclear leukocytes upon P. chabaudi infection. Hence, it appears unlikely that neutrophils were a major contributor to differences in either immunity or pathology in this specific context. However, we cannot definitively rule out that neutrophil numbers were affected earlier in the infection or that neutrophil function was impaired due to cellular iron deficiency.

      A section was added to the discussion to address the role of innate immune cells in our model (line 354-363):

      “The inhibited innate immune response to P. chabaudi in TfrcY20H/Y20H mice likely contributed to both the increased pathogen burden and the decreased liver pathology. Splenic MNPs are important for controlling parasitaemia (34,35,72), but MNPs are also vital for maintaining tissue homeostasis and preventing tissue damage in malaria (43,73). Although other innate cells, such as neutrophils, NK cells and γδT cells are an important part of the immune response to malaria, only the MNP response was distinctly impaired in TfrcY20H/Y20H mice. Notably, neutrophils are known to be sensitive to iron deficiency (16,74) and to affect both immunity and pathology in malaria (75,76). However, in the context of recently mosquito-transmitted P. chabaudi it appears that monocytes and macrophages, rather than granulocytes, may be particularly important for parasite control and tissue homeostasis (43,72).”

      • In addition, alternative mechanism leading to immune tolerance and reduced tissue damage such as induction of heme oxygenase-1, which is also affected by systemic iron availability, should be discussed.

      Response: __An addition was made to the results section and to Figure S5 to address this reviewer comment (line __269-274):

      “In addition, we measured the expression of two genes that are known to have a hepatoprotective effect in the context of iron loading in malaria: Hmox1 (encodes haemoxygenase-1) and Fth1 (encodes ferritin heavy chain). Liver gene expression of Hmox1 was higher in TfrcY20H/Y20H mice, while the expression of Fth1 did not differ between genotypes, eight days after infection (Figure S5H-I). Thus, the higher expression of Hmox1 may have contributed to the hepatoprotective effect in TfrcY20H/Y20H mice.”

      A relevant sentence was also added to the discussion (line 313-318):

      “For example, HO-1 plays an important role in detoxifying free haem that occurs as a result of haemolysis during malaria infection, thus preventing liver damage due to tissue iron overload, ROS and inflammation (62). Interestingly, infected TfrcY20H/Y20H mice had higher expression of Hmox1, but levels of liver iron and ROS comparable to that of wild-type mice. Consequently, this may be indicative of increased haem processing that could have a tissue protective effect”

      Significance (Required):

      Important and intersting study highlighting the central role of iron homeostasis for immune repsonse to infection. General interest because iron deficiency has high prevalence in areas with high enedemic burden of infection

      Reviewer's expertise: infectious disease, immunity, iron homeostasis-- both basic science and clincal expertise (more than 300 peer reviewed publications on these topcis)

    1. Author Response

      Reviewer #1 (Public Review):

      The cerebral cortex, or surface of the brain, is where humans do most of their conscious thinking. In humans, the grooves (sulci) and bumps (convolutions) have a particular pattern in a region of the frontal lobe called Broca's area, which is important for language. Specialists study features imprinted on the internal surfaces of braincases in early hominins by casting their interiors, which produces so-called endocasts. A major question about hominin brain evolution concerns when, where, and in which fossils a humanlike Broca's area first emerged, the answer to which may have implications for the emergence of language. The researchers used advanced imaging technology to study the endocast of a hominin (KNM-ER 3732) that lived about 1.9 million years ago (Ma) in Kenya to test a recently published hypothesis that Broca's remained primitive (apelike) prior to around 1.5 Ma. The results are consistent with the hypothesis and raise new questions about whether endocasts can be used to identify the genus and/or species of fossils.

      We would like to thank Rev. 1 for their comments on our paper.

      Reviewer #2 (Public Review):

      The authors tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap (the region in fossil endocasts corresponding to Broca's area in the brain), being more similar to the condition in chimpanzees than in humans. The evidence from the described individual points to this direction but there are some flaws in the argumentation.

      We are grateful to Rev. 2 for their comments, although we partially agree with some of them.

      First, we would like to rectify the statement of Rev. 2 that we “tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap”, indeed, our aim was to test this hypothesis and not to try to validate it.

      First, only one human and one chimpanzee were used for comparison, although we know that patterns of brain convolutions (and in addition how they leave imprints in the endocranial bones) are very variable.

      We understand the point raised by Rev. 2 about the variation of brain convolutions in humans and chimpanzees. We used atlases published by Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022) to analyse the endocast of KNM-ER 3732 and compare it to the extant human and chimpanzee cerebral conditions. However, in Figure 2, for the sake of clarity only two Homo and Pan specimens were used to illustrate the comparison (as it has been done in other published papers, e.g., Carlson et al., 2011; Science, Gunz et al., 2020 Sci Adv). In the revised version, we modified the manuscript to explain further our approach (line 156) “We used brain and endocast atlases published in Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022; see also www.endomap.org) for comparing the pattern identified in KNM-ER 3732 to those described in extant humans and chimpanzees. To the best of our knowledge, these atlases are the most extensive atlases of extant human and chimpanzee brains/endocasts available to date and are widely used in the literature to explore variability in sulcal patterns. In Figure 2, the extant human and chimpanzee conditions are illustrated by one extant human (adult female) and one extant chimpanzee (adult female) specimens from the Pretoria Bone Collection at the University of Pretoria (South Africa) and in the Royal Museum for Central Africa in Tervuren (Belgium), respectively (Beaudet et al., 2018).”.

      Second, the evidence from this fossil specimen adds to the evidence of previously describe individuals but still not yet fully prove the hypothesis.

      We tempered our discussion by concluding that (line 116) “Overall, the present study not only demonstrates that Ponce de León et al.’s (2021) hypothesis of a primitive brain of early Homo cannot be rejected, but also adds information […]”.

      Third, there is a vicious circle in using primitive and derived features to define a fossil species and then using (the same or different) features to argue that one feature is primitive or derived in a given species. In this case, we expect members of early Homo to be derived compared to their predecessors of the genus Australopithecus and that's why it seems intriguing and/or surprising to argue that early Homo has primitive features. However, we should expect that there is some kind of continuum or mosaic in a time in which a genus "evolves into" another genus. This discussion requires far more discussions about the concepts we use, maybe less discussion about what is different between the two groups but more discussion about the evolutionary processes behind them.

      We fully agree with Rev. 2 on this aspect. We believe that identifying these differences/similarities between fossil and extant hominids constitute the first step of a better understanding of the evolutionary mechanisms. Our work suggests indeed a certain continuity between genera and raises questions on the genus concept and how to interpret the specimens currently attributed to early Homo. In the revised version of the manuscript we included a reference to this possible scenario (line 134): “[…] or to the absence of a definite threshold between the two genera based on the morphoarchitecture of their endocasts (Wood and Collard, 1999).”.

      Fourth, the data of convolutional imprints presented are rather subjective when identifying which impressions represent which brain convolutions. Not seeing an impression does not necessarily mean that the corresponding brain feature did not exist. Interestingly, the manuscript does not mention and discuss at all the frontoorbital sulcus. This is a sulcus that usually runs from the orbital surface of the frontal lobe up to divide the inferior frontal gyrus in chimpanzees, a condition totally different than in humans who do not have a frontoorbital sulcus. Could such a sulcus be identified, this would provide a far more convincing argument for a primitive condition in this specimen. In Australopithecus sediba, e.g., the condition in this region seems to be a mosaic in which some aspects of the morphology seem to be more modern while one of the sulcual impressions can well be interpreted as a short frontoorbital sulcus. For this specimen, by the way, I would come back to my third point above: some experts in the field might argue that this specimen could belong to Homo rather than Australopithecus...

      We agree that the presence of a fronto-orbital sulcus would be more conclusive. However, this sulcus has not been identified in KNM-ER3732 and the region in which we would expect to find it is not preserved. As demonstrated by Ponce de León et al. (2021), because of the topographic relationships between sulci (and cranial structures), it is possible to interpret imprints on endocasts and the evolutionary polarity of some traits even in the absence of landmarks such as the fronto-orbital sulcus. In Australopithecus sediba the main derived feature of the endocast corresponds to the ventrolateral bulge in the left inferior frontal gyrus, and not to the sulcal pattern itself (Carlson et al., 2011 Science). However, the discussion around the taxonomic status of this taxon confirms the urgent need for reconsidering specimens from that time period and clarifying the mosaic-like or concerted evolution of the derived Homo-like traits within our lineage. Regarding the subjective nature of this approach, we invite readers to examine the specimen on MorphoSource (https://www.morphosource.org/concern/media/000497752?locale=en) and to request access to the National Museums of Kenya to the physical or virtual specimen to falsify our hypothesis.

      According to my arguments above, I think that this manuscript might revive interesting discussions about this topic but it is not likely to settle them because the data presented are not strong enough to fully support the hypothesis.

      We would be more than happy to consider new/other specimens with similar chronological and geographical contexts and investigate further this hypothesis in the future.

      Reviewer #3 (Public Review):

      The authors provide a detailed analysis of the sulcal and sutural imprints preserved on the natural endocast and associated cranial vault fragments of the KNM-ER3732 early Homo specimen. The analyses indicate a primitive ape-like organization of this specimen's frontal cortex. Given the geological age of around 1.9 million years, this is the earliest well-documented evidence of a primitive brain organization in African Homo.

      In the discussion, the authors re-assess one of the central questions regarding the evolution of early Homo: was there species diversity, and if yes, how can we ascertain it? The specimen KNM-ER1470 has assumed a central role in this debate because it purportedly shows a more advanced organization of the frontal cortex compared to other largely coeval specimens (Falk, 1983). However, as outlined in Ponce de León et al. 2021 (Supplementary Materials), the imprints on the ER1470 endocranium are unlikely to represent sulcal structures and are more likely to reflect taphonomic fracturing and distortion. Dean Falk, the author of the 1983 study, basically shares this view (personal communication). Overall, I agree with the authors that the hypothesis to be tested is the following: did early Homo populations with primitive versus derived frontal lobe organizations coexist in Africa, and did they represent distinct species?

      I greatly appreciate that the authors make available the 3D surface data of this interesting endocast.

      We are grateful to Rev. 3 for their comments and for contextualizing our finding. We would also like to point out that, although the 3D surface can be viewed on MorphoSource, permission from the National Museums of Kenya has to be requested for studying the specimen and getting access to the physical specimen and/or the 3D model.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their comments and insights, we feel the manuscript is now greatly improved. Please find below our answers to the reviewer’s queries

      Reviewer #1 (Evidence, reproducibility and clarity):

      The manuscript by Niccoli et al. describes the identification of a novel modifier of C9orf72-derived toxicity based on the manipulation of the brain metabolic pathways. The premise for this work is supported by strong literature describing the aberrant glucose metabolism in FTD, AD and other degenerative disorders. The idea tested here is whether increasing the import of pyruvate produced in glia into neurons. They test three different types of importers and find that one of them, Bumpel, the orthologue of human SLC5A12, suppresses toxicity and reduces the accumulation of arginine-containing repeats, GP and PR. The authors investigate several potential mechanisms mediating this reduction of toxic DPRs, but do not find strong evidence linking pyruvate import and increase autophagy or mitochondria metabolism.

      Overall, this is an interesting discovery based on a candidate approach that shows the power of Drosophila to efficiently identify novel mediators of neurodegeneration. The article is well written, although more detailed explanations of some experiments would be helpful. The weaknesses of the manuscript are the lack of a clear mechanism mediating the protective activity of pyruvate, the incomplete experiments lacking relevant controls, and the presentation of western blots.

      Specific comments:

      1. The reduced levels of DPRs require that the expression of C9 mRNA or the GR and PR constructs is examined by qPCR. In figure 3E, GP is not even detectable_

      We agree with the reviewer, ideally we would have measured the RNA by qPCR. However, the C9 repeats and the DPR constructs are highly repetitive, it is therefore impossible to do a qPCR for them. The upstream and downstream sequence is identical for the C9 and the bumpel constructs, there isn’t, to our knowledge any unique sequence we can use to measure levels of expression in the presence of bumpel.

      We did run a GFP control (Fig 2D) and did not see any difference and we have now carried out a qPCR for Gal4-GeneSwitch (Fig S3) to show that the levels of the driver do not change.

      1. I wonder if there are constructs available to silence Bumpel or overexpress the human orthologues of bumpel. These would be nice controls for the effects observed with the Bumpel overexpression

      This would be an extremely interesting experiment, however bumpel is normally only expressed in glia, therefore we can’t down-regulated it in glia whilst upregulating 36R in neurons, as we are limited to one driver (since everything is driven by the Gal4/UAS system). Expression of C9 in glia does not have a clear phenotype (our observation), so we can’t drive both in glia. We tried over-expressing the human homologue SLC5A12 , but it did not rescue the C9 phenotype (data not shown), possibly because it requires (like other human SLC5A type transporters) PDZK1 as extra co-factor (Srivastava S. et al, 2019), and this is not present in flies.

      1. The argument about bumpel modulating autophagy downstream of Atg1 is not supported by the experimental data

      We now have imaging data showing that bumpel modulates the formation of lysosomes, downstream of Atg1 (Fig 5). We also show that bumpel and Atg1 can act synergistically, leading to a much stronger rescue of C9 expression (See Fig 5I.), which also suggests that the two are acting at different points in the same pathway. We also show that bumpel rescues the downregulation of TFEB targets (Fig 5J)

      1. Western blots throughout show no control lanes and in several occasions are created with cutout bands. The standard for this type of experiments should be more stringent, with entire gels showing all experimental conditions, which requires consistent methods and results vs selecting the best bands from different gels.

      We apologise if this was mis-understood, the lanes shows are all from the same blot, where other samples were run too, and it would be confusing for the reader to include them. We have re-run samples where we had remaining sample from our quantifications, so that the lanes are now contiguous and we provide original blot images in the supplemental information for those we could not re-run. The control for all experiments are the C9 expressing line without bumpel, and this is always present, if the reviewer means we are missing -RU controls, these do not produce any DPRs so are not included in western blot or ELISA quantifications as the signal is not above back-ground.

      1. For figures 2B and 5C, please, show representative WBs

      These are ELISA quantifications, not western blots, we choose to run these when possible, as they are more quantitative.

      1. Figure 5D describes the survival curve as significantly rescued. Statistical tests can indicate differences, but that is in no way convincing. The test may show the curves are different, but the abeta Atg1 flies also seem to start falling early, so an argument could be made in both directions, as a suppressor or an enhancer.

      We agree the rescue is not strong enough, we have now removed this lifespan.

      1. It is unclear why several results are placed in the supplemental materials. In general, all this material seems highly relevant and related to what is shown in the main figures

      We are happy to include them in the main manuscript if this would help the reader, and we have now placed all mitochondrial data in Fig 4.

      Minor comments:

      Please, define several abbreviations throughout

      We apologise for this over-sight, we have now does this.

      A couple of sections could be improved by carefully sequencing human vs Drosophila background to advance the argument rather than going in circles. There is also a section on mitophagy in between two sections related to autophagy that could be sequenced better.

      We have re-structured the sections, we think this has improved the flow.

      There is a sentence at the end of page 6 that seems misplaced

      We apologise for the over-sight, and we have removed this

      Reviewer #1 (Significance):

      Overall, this is an interesting discovery based on a candidate approach that shows the power of Drosophila to efficiently identify novel mediators of neurodegeneration. The article is well written, although more detailed explanations of some experiments would be helpful. The weaknesses of the manuscript are the lack of a clear mechanism mediating the protective activity of pyruvate, the incomplete experiments lacking relevant controls, and the presentation of western blots.

      We thank the reviewer for the helpful comments, we have added some details in the methods section, we apologise for not having made it clear that the westerns were all derived from the same blot (we have now placed the originals in the supplemental materials). Regarding mechanism, we now show that bumpel over-expression increases clearance of late stage autolysosomes, possibly by increasing transcription of TFEB target lysosomal genes.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary:<br /> Project investigates the role in dementias of glial glucose uptake, conversion to lactate and shuttling via transporters to neurons to produce pyruvate to fuel TCA cycle production of ATG. The experiments are conducted in Drosophila melanogaster, which have become a powerful model system for understanding neurodegeneration mechanisms associated with ALS/FTD associated C9orf72 pathology. Bumple misexpression is shown to rescue early death phenotype in flies expressing a C9orf72 expansion and flies expressing arginine containing di-peptide repeat proteins. The report describes novel insight into the function of bumpel, demonstrating that this conserved orthologue of human SLC14A functions as a sodium exchange transporter for monocarboxylates pyruvate and lactate. These findings conclude that increased neuronal pyruvate, but not its metabolites, rescues C9orf72 associated pathology.<br /> The authors next set out to describe the mechanism by which increase pyruvate rescues survival in C9orf72 expressing flies. Levels of autolysosomes were increased in C9orf72 expressing flies, and stimulation of autophagy by overexpression of atg1 shown to decrease levels of DPRs (though not to same extent as bumple expression). Expression of bumple in C9orf72 flies led to a modest increase in LC3-II, indicating increased autophagy. Co-overexpression of bumple and atg1 did not have an additive effect, suggesting bumple activates autophagy downstream or independent of atg1 activity. Finally the author extend their findings to amyloid models, suggest a common protective mechanism for elevating neuronal pyruvate levels in neurodegenerative disease.

      Major comments

      Prior data suggests that bumpel is expressed in glia (for example Yildirim et al 2022). In their study the authors do not present any data to demonstrate that the transporter is normally expressed in neurons in flies. This calls into questions the physiological relevance of their findings, that neuronal upregulation of bumpel is protective against C9orf72 associated pathology in neurons, from which it is reasonable for a reader to conclude that bumpel may be a neuronal target for therapeutic intervention. However, the report well demonstrates that regardless of whether the transporter in native to neurons, the increase in monocarboxylates it facilitates is projective against C9orf72 pathology and thus the overall conclusion of the project is supported by experimental evidence. The point of upregulation of a natively expressed gene versus misexpression of a glial enriched transporter should be considered in a bit more detail in the discussion text. The authors may consider speculating the identify of members of the sodium coupled monocarboxylate transporters that are enriched in neurons. Are any of the bumple human orthologues expressed in neurons?_

      We thank the reviewer for this comment and suggestion. The reviewer correctly points out that we do not show whether there is a defect in pyruvate import in C9 expressing flies. We could not identify a validated sodium coupled pyruvate transporter in flies with a strong neuronal expression, we have added a comment in the discussion about this. There are a number of human homologues, some, such as SLC5A8, are expressed in neurons, thus providing a possible therapeutic target. We have added a sentence to this regard in the discussion.

      [_OPTIONAL] cDNA overexpression of neuron specific sodium coupled monocarboxylate transporters in C9orf72 fly models would strengthen the conclusion their physiological relevance for ALS/FTD. Fly lines for these are not available in repositories, but could be generated and tested at reasonable cost (<£700, ~3 month duration).

      This would be an ideal experiment, however, we could not find a neuronal sodium coupled transporter which is known to import monocarboxylates. There are a number of sodium coupled neuronal transporters, but they are mostly homologous to SLC5A6, which is a glucose coupled transporter. Going forward, we will screen a number of transporters to identify if there are any which import pyruvate.

      The role of bumple expression in survival (Figure 1) could be a technical artifact due to dilution of Gal4 between C9orf72 and bumple-ORF transgenes. No expression control is shown (for example GFP, LacZ etc). This theory is unlikely as no improvement in survival was seen for the SLC14A class of transporters which have a matching site directed transgene insertion. For clarity this point relating to controls should be commented on in the text.

      The reviewer is correct, there could be a dilution of the Gal4. We don’t like using GFP as a control as we have often seen a worsening when expressing other highly stable proteins at high levels. We have generated an “empty” flyORF line (generated by injecting the empty plasmid into the identical attP site), and used it as a control to check for dilution effects, bumpel still rescued relative to this control, we now include this is the supplementary (Fig S1B).

      Reduced Mito-GFP levels are used to support a role for bumple in increasing mitophagy. As mito-GFP is a marker for mitochondria but not specifically mitophagy, an alternative explanation for decreased levels could be reduced mitochondria biogenesis. The text should be amended to clarify this point.<br /> The role of Pink1 RNAi in modifying mitophagy is a bit overstated. Whilst Pink1 is involved in stress associated mitophagy, its role in basal mitochondria turnover is less well defined. Text should be adapted.

      We have added qualifying statements regarding the possibility of reduced mitochondrial biogenesis, and the fact that Pink1’s role in basal mitophagy is not very clear. The use of the mitophagy inducer drug, Kaempferol, however, suggests that mitophagy is unlikely to be a cause of the DRP reduction.

      Minor comments

      Introduction well describes current state of C9orf72 fly models. Introduction would benefit from a few comparable lines for AD models. The first paragraph of reports may also be better placed in the introduction._

      We thank the reviewer for the suggestion, and have added a more in depth introduction to Aß and have moved the first paragraph of the results section to the introduction

      Figure 1 presents survival for three SLC16A transporters and bumple. The C9 control curve appears to be consistent between charts, likely indicating the same control used across experiments, rather than independent controls for each chart. The authors should considered showing either all SLC16A and bumple data on a single chart, or clarify in the figure legend that a common control dataset is used. GFP control is used in later experiments (Figure 2).

      We have now indicated that the SLC16A transporters were run together in the figure legend.

      Choice of amyloid model needs a line of explanation, particularly with regard to extra/intracellular deposition of amyloid in this model.

      We have now added a few sentences describing this when the model is introduced

      Fruit Fly Injection method section needs a bit more detail to describe site of injection (head, body etc). This is not clear in the result section either.

      We have now added this, the injection was done in the abdomen.

      How were bumple orthologues identified? What degree of conservation (sequence homology etc?)

      The bumpel orthologues are those identified as most similar by flybase. We have now added the degree of conservation in the text

      The speculative mechanism for C9 pathology modification involves interaction of neurons and glia, monocarboxylate transporters and changes in autophagy activity. For clarity a diagram showing the model may be a helpful addition.

      We have now added a diagram explaining how we think the rescue is achieved

      Typos:<br /> Figure 1 Legend - "p values of ona way ANOVA "

      We apologise for the error, and have now corrected it

      Figure S2 Legend - Atg1 RNAi genotypes from S2 legend are mentioned erroneously

      We apologise for the error, and have now corrected it

      Repetition of text in results: "Bumpel, together with its paralogues kumpel and rumpel, is expressed in glia in flies, where it is thought to promote transport of substrates across the brain (31)."

      We apologise and have rectified this

      "Modulation of Atg1 when bumpel was co-overexpressed, however, did not affect GP<br /> levels (Fig 4E, F)" - Should be refering to Fig 4D, E)

      We apologise and have rectified this

      Reviewer #2 (Significance):

      The study will be of broadly of interest to researcher working in the fields of neurodegeneration and metabolism, providing evidence for a protective role of elevated pyruvate in neuron that provide new understand relating to pathology in C9orf72 associated motor neuron disease and frontotemporal dementia.

      Strengths:<br /> The study presents novel data to demonstrate that overexpression of fly monocarboxylate transporter bumple rescues an early death phenotype associate with ALS/FTD gene C9orf72. Any novel therapeutic strategies of ALS are of interest to the field, and the strategy demonstrated here may be readily translated to human cell culture systems for proof of principle translational studies to a more physiologically relevant system. This study further demonstrates the utility of invertebrate models to generate novel understanding of C9orf72 pathology.

      Limitations:<br /> The study speculates that there is a link between pyruvate levels and increased autophagy, however the mechanisms by which this occurs is not defined in present study. This is a limitation of the experiment, though opens up an interesting question for future studies._

      We thank the reviewer for their comments, and we have now added experiments characterising the role of bumpel in autophagy, particularly showing its rescue of a late autolysosomal block.

      Reviewer expertise: The reviewer researches ALS and dementia associated neurodegeneration, utilising Drosophila, rodent and stem cell derived model systems.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This is an interesting manuscript in which the authors provide evidence that elevated neuronal expression of the pyruvate transporter bumpel can partially rescue shortened lifespan in fly models of frontotemporal dementia and Alzheimer's disease. In addition, elevated neuronal bumpel expression can reduce accumulation of arginine containing FTD-linked dipeptide repeat proteins. Some evidence is presented that elevated neuronal bumpel expression may activate autophagy. These findings are novel and may have implications for therapeutic interventions based on pyruvate import/metabolism to treat neurodegenerative disorders. However, I have several concerns as follows:

      Major Comments:

      1. The authors provide no explanation as to why they targeted bumpel overexpression in neurons. Endogenous bumpel appears to be predominately expressed in glia cells so why not target these cells instead?

      We wanted to increase pyruvate import in neurons, so we over-expressed a number of pyruvate transporter that were available in the fly ORF stock centre (so that they would all be inserted into the same site and therefore directly comparable), we were mainly interested in cell autonomous effects of importing glycolytic metabolites. Over-expressing bumpel in glia would be indeed an extremely interesting experiment, unfortunately we do not have the ability to express C9 in neurons while over-expressing bumpel in glia as we only have one over-expression system that works. We are working towards generating a new C9 model so we can then use the Gal 4 system to over-express bumpel in glia, but this is currently not available yet. Over-expression of C9 in glia is not toxic and not a good model of disease.

      1. Data is shown that overexpressed bumpel can suppress GR and PR dipeptide repeat toxicity when these peptides are translated using an ATG start codon (Fig 2D,E). Does bumpel mediated neuroprotection also correlate with a reduction in DPR levels driven with an ATG start codon?

      This would be a very interesting question, unfortunately, whist the Isaacs lab kindly made available the GR antibody for the initial ELISA experiment, we no longer have that antibody available and we do not have a working PR antibody. GR and PR westerns are not possible to carry out as the proteins are too positively charged to run. We do show that bumpel can down-regulate Aß from a UAS promoter, so its effect is not specific to RAN translation.

      1. The authors provide some evidence suggesting that overexpression of bumpel increases autophagy in the fly brain. However, knockdown of Atg1 while co-expressing bumpel (Fig 4E) did not result in increased GP protein levels. In addition, Atg1 knockdown did not attenuate the protective effects of bumpel overexpression (Fig 4I), suggesting that bumpel is working through a pathway independent of autophagy to promote DPR clearance and protection against toxic peptide accumulation. The authors need to modify the interpretation of their data and temper their claim that autophagy contributes to bumpel-mediated protective effects in the CNS.

      We apologise the data was not strong enough. We have now added evidence that bumpel acts downstream of Atg1, on late stage autolysosomal clearance. We also show that bumpel and Atg1 can act synergistically to improve the C9 phenotype when over-expressed, this is now described in Fig 5.

      1. Although the authors present evidence that increased bumpel expression can activate autophagy, the data is not convincing that the neuroprotective effects associated with bumpel are mediated through autophagy. Pyruvate, in some circumstances, can non-enzymatically scavenge hydrogen peroxide or in other cases trigger oxidative stress resistance through hormetic ROS signaling. The authors should consider these alternative possibilities.

      These are indeed possibilities, we have added a sentence to that effect in the discussion, we have now also showed that bumpel is affecting late clearance of autolysosomes, and is leading to an increase in TFEB targets.

      1. The authors rely on overexpressing bumpel to attenuate C9 toxicity in flies. They should perform the opposite experiment and knockdown bumpel to demonstrate that reduced bumpel expression results in potentiation of C9 and amyloid beta neurotoxicity. In addition, then should show that knockdown of bumpel expression has some effect on autophagy.

      This would be a very interesting experiment, unfortunately bumpel is expressed only in a few glia subtypes in a wild type fly, and we can’t downregulate it in glia while over-expressing toxic proteins in neurons, because of limitations of our expression system, both genes need to be over-expressed in the same cell type. We have tried downregulating bumpel in neurons, and don’t get an effect on phenotype, and no effect on DPR levels, but bumpel expression in neurons is extremely low. Moreover, bumpel has 2 paralogs, rumpel and kumpel,(also only present in glia) and all three need to be knocked out for phenotypes to become visible in glia (Yildirim et al, 2022). These experiments would be interesting but outside out scope.

      We are in the process of generating new C9 models to be able to do these experiments, but these are currently outside the scope of this work.

      Minor Comments:

      1. Neuronal overexpression of bumpel appears to shorten lifespan of wild type flies (Fig 2A). It is possible that neuronal import of pyruvate may drive mitochondrial oxidative phosphorylation and ROS formation. The authors should comment on this possibility in the discussion._

      This is a very good point, we have added a point to that effect.

      1. In Fig 3 the authors used a mixture of sodium pyruvate and ethyl pyruvate to demonstrate the import properties of bumpel. The rationale for using ethyl pyruvate is unclear as this membrane-permeable metabolite can by-pass any transporters.

      The ethyl pyruvate was only used in the injection of flies, not for the FRET experiments looking at the import properties of bumpel. Since we were not over-expressing bumpel, we needed the pyruvate to by-pass the requirement for a transporter. We were showing that delivery of pyruvate by another methods (other than by a transporter) was able to phenocopy the over-expression of bumpel, thus showing the effect is mediated by pyruvate entrance into the cell.

      1. In the introduction several acronyms are used (i.e. GRN, MAPT, TREM2) that are not defined.

      We apologise and have now rectified this.

      Reviewer #3 (Significance):

      To my knowledge, this is the first study to identify that bumpel can permit the import of pyruvate and lactate into neurons when ectopically expressed in the fly brain. The fact that increased neuronal pyruvate import can partially protect against toxic peptide accumulation is unexpected and quite novel. Although some evidence is presented that bumpel can trigger autophagy, it is not clear if autophagy is mediating bumpel neuroprotective effects. Alternative mechanisms related to pyruvate effects on ROS and oxidative stress resistance should be considered.

      We thank the reviewer for their comments, and have added clarifying statements regarding the potential role of ROS.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for their comments and suggestions, which were very helpful to improve our manuscript. The revised manuscript notably includes the following improvements:

      • To evaluate the relevance of identified candidate targets genes, we integrated an additional screening step in our method, corresponding to the analysis of RNAseq datasets specific of blood or brain cells. RNAseq data from irradiated hematopoietic stem cells or splenic cells were analyzed and included in the new Table S19, and RNAseq data from zika virus-infected neural progenitors were analyzed and included in the new Table S28. In addition, we also verified that the expression of a subset of blood related genes was decreased in the bone marrow cells of p53Δ31/Δ31 mice, known to exhibit increased p53 activity and to phenocopy dyskeratosis congenita (new Figure S8).
      • Luciferase data were expanded to show that, for promoters exhibiting a significant p53-mediated repression in luciferase assays, the p53-dependent regulation was abrogated after mutation of the putative DREAM binding site (new Figures 2e and 2i).
      • We found putative DREAM binding sites for 151 targets, and the predicted binding sites were precisely mapped relative to the position of ChIP peaks of DREAM subunits (E2F4 and LIN9) and to transcription start sites of target genes. These additional analyses, shown in the new Figures 3a and 3b, further suggest the reliability of our predicted binding sites. Notably, hypergeometric tests of the distribution of DREAM binding sites relative to E2F4/LIN9 ChIP peaks reveal a significant >1300-fold enrichment of these sites at ChIP peaks.
      • We now present a detailed comparison of our results with those reported in other studies, notably the predicted E2F and CHR sites from the Target gene regulation database (new Figure S11), or the list of candidate DREAM targets suggested from Lin37 KO cells (new Figure S10 and new Table S35). This also leads us to discuss the different types of DREAM binding sites (bipartite sites (e.g. CDE/CHR or E2F/CLE) vs sites composed of a single E2F or a single CHR motif).
      • We integrated updates of the Human phenotype ontology website to include the latest lists of genes related to blood or brain ontology terms in our analysis. In the previous version of the manuscript we had analyzed a total of 811 genes downregulated ≥ 1.5 fold upon bone marrow cell differentiation. Our revised manuscript now includes the analysis of 883 genes.
      • Several improvements were made to present our results more clearly and with more details : 1) additional evidence that the differentiation of Hoxa9ER cells correlates with p53 activation is now provided in the new Figure S1; 2) the precise values for gene expression after bone marrow cell differentiation, as well as p53 regulation scores from the Target gene regulation databases are included in the new Tables S1, S5, S8, S11, S14, S20 and S23; 3) A Venn-like diagram was included to summarize the different steps of our approach in the new Figure 3c, with detailed lists of genes selected at each step in new Tables S17 and S26; 4) for genes associated with blood or brain genetic disorders, bibliographic references describing gene mutations and clinical traits were included in a new Table S36; 5) Figure 4a and Table S37 were improved to include evidence that increased BRD8 in glioblastoma cells leads to a decreased expression of several genes transactivated by p53.

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary<br /> In this paper the authors describe a data driven approach to identify and prioritise p53-DREAM targets whose repression might contribute to abnormal haematopoiesis and brain abnormalities observed in p53-CTD deleted mice. The premise is that in these mice, (where they have previously demonstrated p53 to be hyperactive in at least a subset of tissues), that the p53-p21-E2F/DREAM axis is at least in part responsible for observed phenotypes due to the repression of E2F and CDE/CHE element containing genes. Their approach to home in on relevant genes is based on transcriptomic gene ontology analysis of genes repressed in these disease settings where they primarily use publicly available data from HOXA9-ER regulated model of HSC expansion wherein they observe increases on p53-p21 expression upon differentiation where they demonstrate that p53-p21 DREAM target genes are suppressed as we would expect in this scenario where p53-p21 is activating withdrawal from cell cycle. They then spend a lot of effort analysing this datasets combining "gene-ontology", "disease phenotype" and "meta-ChIP-seq" analysis of public data to support the observation that mutations of genes suppressed in this manner are disproportionately linked to heritable haematopoetic and brain disorders. While these results are interesting in terms of framing a hypothesis about how mutations in p53-p21-DREAM regulated targets contribute to such conditions, they are to be expected given the now very well described impact of p53-p21 on both E2F4/DREAM targets.

      We agree with the referee that the impact of p53-p21 on both E2F4/DREAM targets is well described. However, discussions with many scientists or clinicians specialized in bone marrow failure syndromes or microcephaly diseases led us to realize that most were not familiarized with the p53-DREAM pathway, so that a study that would bridge the gap between DREAM experts and bone marrow or microcephaly specialists would be particularly useful. In addition, we thought that strategies that would rely on disease-based ontology terms were likely to identify new targets, compared to previous studies that considered cell cycle regulation instead of disease phenotypes. Consistent with this, many genes we identified as candidate DREAM targets were not reported in previous studies. In addition, as detailed below, our positional frequency matrices led to identify DREAM binding sites that had not been predicted by previous approaches.

      The natural progression of this work would be to go on to show this occurs in relevant cells or tissues derived from the p53-CTD mice as well as look at modulating target genes to understand underlying mechanisms and consequences.<br /> Rather than this, they focus on validating that a sub-set of these targets are indeed suppressed by specific p53 activation by MDM2 inhibitor Nutlin-3A in MEFs by qPCR and that mutation of predicted CDE CHR elements in luciferase constructs leads to increase luciferase activity. While these findings support their predictions, the results are entirely expected based on what is known about such targets and demonstrating that this occurs in MEFs does not closely relate to haematopoietic and brain cells they suggest this regulation is important. In fact, in the discussion, the authors comment on the importance of cell type context specificity in terms of discordance between predictions of TF binding sites and public datasets.

      We agree that additional data from relevant cells or tissues were required to strengthen our conclusions. In the revised manuscript, we evaluated the relevance of candidate target genes related to blood ontology terms by integrating an additional screening step in our method, corresponding to the analysis of RNAseq datasets specific of blood cells. We analyzed dataset GSE171697, with RNAseq data from hematopoietic stem cells of unirradiated p53 KO, or unirradiated or irradiated WT mice, as well as dataset GSE204924, with RNAseq data from splenic cells of irradiated p53Δ24/- or p53+/- mice. The latter dataset appeared interesting because p53Δ24 is a mouse model prone to bone marrow failure and the spleen is a hematopoietic organ in mice. The analysis of these datasets is included in the new Table S19. In the datasets,increased p53 activity correlated with the downregulation of most of the 269 candidate DREAM targets. However, 56 genes which appeared upregulated in cells with increased p53 activity were considered poor candidate p53-DREAM targets and removed from further analyses, leading to a list of 213 genes that appeared as better candidate p53-DREAM targets related to blood abnormalities. Furthermore, we also verified that the expression of a subset of blood-related candidate genes was decreased in the bone marrow cells of p53Δ31/Δ31 mice (prone to bone marrow failure) compared to bone marrow cells from WT mice. This result is presented in the new Figure S8.

      As for genes related to brain development, we discussed in the previous version of the manuscript that most genes mutated in syndromes of microcephaly or cerebellar hypoplasia are involved in ubiquitous cellular functions (chromosome condensation, mitotic spindle activity, tRNA splicing…), which suggested that our analysis of transcriptomic changes associated with bone marrow cell differentiation might also be used to identify brain specific targets. However, we agree with the referee that confirmation of these brain specific targets in a more relevant cellular context was preferable. In the revised manuscript, we included the analysis of datasets GSE78711 and GSE80434, containing RNAseq data from human cortical neural progenitors infected by the Zika virus (ZIKV) or mock-infected, because ZIKV was shown to cause p53 activation in cortical neural progenitors and microcephaly. This analysis is detailed in the new supplementary Table S28. In both datasets, increased p53 activity correlated with the downregulation of most of the 226 candidate DREAM targets. Sixty-four genes which appeared more expressed in ZIKV-infected cells were considered poor candidate p53-DREAM targets and removed from further analyses, leading to a list of 162 candidate p53-DREAM targets related to brain abnormalities. We think this significantly increases the relevance of our analysis of brain-specific targets.

      Finally, they try and contextualise effects in glioblastoma data by correlating target gene expression with levels of BRD8 since it has recently been shown to attenuate p53 function in glioblastoma and show that some of the brain disease associated genes are expressed at higher levels in BRD8 high patient samples. It seems strange here that they do not also look at expression of p21 or other p53 targets that would help ascertain if p53 activity is indeed suppressed. Moreover, much more elegant methods for predicting transcription factor activity could be applied to this data.

      We agree with the referee. Indeed, when we had performed the analysis of glioblastoma cells, we first verified that increased BRD8 levels correlated with decreased p21 levels in these cells. However, we had not included this verification in the previous version of the manuscript. In this revision, we improved the Figure 4 (and Table S37) reporting the analysis of glioblastoma cells to address this point. In Figure 4a, we now show the variations in mRNA levels between BRD8Low and BRD8High tumors, for BRD8 itself, as well as 5 genes well-known to be transactivated by p53 (p21, MDM2, BAX, GADD45A and PLK3) and the 77 p53-DREAM targets associated with microcephaly or cerebellar hypoplasia. The data clearly show that tumors with high BRD8 exhibit a decrease in the expression of p53 transactivated targets, and an increase in p53-DREAM repressed targets.

      Major Comments<br /> The major result of this paper as it stands is the prioritisation of candidate genes in the p53-DREAM pathway involved in these conditions, and their refined approach used to identify and prioritise these genes and is such more of a starting point for further investigation. They fall short of demonstrating the relevance of their predictions physiologically in tissues from the mice and do not demonstrate functional importance of regulation of targets they put forward. Given that these genes will be co-ordinately regulated, without a mechanistic experiment in physiologically relevant model it is impossible to infer causality. For example, depleting individual targets in the HOXA9 model and evaluating impact on survival, proliferation and differentiation may be a (relatively) simple way to explore this, perhaps comparing to effects of p53 activating agents such as Nutlin-3A. Of note the authors (Jaber 2016 PMID: 27033104) and several other groups had (Fischer 2014 PMID: 25486564 McDade 2014 PMID: 24823795) previously demonstrated the link between p53-p21 and suppression of DNA-repair/Damage related genes (as is also observed here in particular FA-related genes that they discuss briefly here. I would have thought that this would be an obvious starting point for some mechanistic experiments and in fact I note this has been demonstrated before (Li et al 2018 PMID: 29307578)

      The starting point of our study is not the prioritization of DREAM target genes, but rather the detailed phenotyping of p53Δ31/Δ31 mice that we performed in previous publications (Simeonova et al. Cell Rep 2013, Toufektchan et al. Nat. Commun. 2016), in which we mentioned phenotypical traits typical of dyskeratosis congenita and Fanconi anemia, including notably bone marrow failure and cerebellar hypoplasia.

      We understand that depleting individual targets in the Hoxa9 system and evaluating impact on survival, proliferation and differentiation might seem appropriate to explore their potential causality. However, our previous work on Fanc genes leads us to think that this might not be informative. Regarding this, we now clearly discuss in the revised version of the manuscript : “Finding a functionally relevant [DREAM binding site] for Fanca, mutated in 60% of patients with Fanconi anemia [59,60], may help to understand how a germline increase in p53 activity can cause defects in DNA repair. Importantly however, we previously showed that p53Δ31/Δ31 cells exhibited defects in DNA interstrand cross-link repair, a typical property of Fanconi anemia cells, that correlated with a subtle but significant decrease in expression for several genes of the Fanconi anemia DNA repair pathway, rather than the complete repression of a single gene in this pathway [25]. Thus, the Fanconi-like phenotype of p53Δ31/Δ31 cells most likely results from a decreased expression of not only Fanca, but also of additional p53-DREAM targets mutated in Fanconi anemia such as Fancb, Fancd2, Fanci, Brip1, Rad51, Palb2, Ube2t or Xrcc2, for which functional or putative [DREAM binding sites] were also found with our systematic approach.” We further discuss in the manuscript how this may also apply to telomere-, ribosome-, of microcephaly-related genes.

      The analysis of brain specific targets and the link to BRD8 sits largely as an aside and the analysis of patient data from glioblastomas is underdeveloped as noted above.

      As we previously mentioned, the revised manuscript includes the analysis of RNAseq datasets from human cortical neural progenitors infected by the Zika virus (ZIKV) or mock-infected, which significantly increases the relevance of our analysis of brain-specific targets. Furthermore, we improved Figure 4 to present more clearly the impact of BRD8 levels on the expression of genes transactivated by p53 or repressed by p53-DREAM.

      The computational methods applied are robust, albeit predominantly coorelative, in terms of identifying regulation of potential causative target genes, validated across human and mouse cell lines, and this indicates a role of these genes in the relevant conditions. However, further validation through application in a bulk or single cell RNAseq patient cohort, or at least an in vivo model would strengthen these conclusions and complement the work presented here which is based on in vitro mouse and human cells. This is pertinent as this study improves upon previously published approaches by focusing on "clinically relevant target genes". Additionally, this would exhibit the potential applications of the findings presented.

      We thank the referee for this comment. As mentioned above, in the revised manuscript we analyzed RNAseq data from hematopoietic stem cells of unirradiated WT or p53 KO mice, or irradiated WT mice, and from splenic cells of irradiated p53D24/- or p53+/- mice, and quantified the expression of a subset of blood-related candidate genes in the bone marrow cells of p53Δ31/Δ31 mice (prone to bone marrow failure) and WT mice (new Figure S8 and Table S19). For genes related to brain development, we included the analysis of RNAseq data from human cortical neural progenitors infected by the Zika virus (ZIKV) or mock-infected (Table S28). These RNAseq analyses were added as an additional screening criterion in our approach, which significantly increased the relevance of the target genes identified.

      In terms of statistical analysis, the hypergeometric test should be applied to assess significant enrichment of genes for example with CDE/CHR regions within the previously identified lists.

      In the revised manuscript, we precisely mapped the DREAM binding sites in 50 bp windows within regions bound by E2F4 and/or LIN9, an analysis included in new Figure 3a. We then compared the distribution of DREAM binding sites at the level of ChIP peaks compared to their distribution over the entire genome and found a > 1300-fold enrichment of these sites at ChIP peaks. This significant enrichment (f=3 10-239 in a hypergeometric test) is most likely underestimated because mouse-human DNA sequence conservations were not determined for putative DBS over the full genome. These new analyses clearly reinforce our previous conclusions.

      Minor Comments<br /> References are required for the genes listed which play a role in the diseases of interest.

      In the revised manuscript, references are provided for genes which play a role in the diseases of interest. Due to the large number of added references, these were included in a new supplementary table, Table S36.

      This paper would benefit from the inclusion of summary schematics and tables throughout (rather than relying only on somewhat unwieldy heatmaps which show little other than all these genes are co-ordinately regulated), this could include summaries of the methods applied, gene or CDE/CHR inclusion criteria, and Venn diagrams indicating the subsets of final genes identified through this approach.

      We thank the referee for this suggestion. In the revised manuscript we provide a Venn-like diagram of the different steps of our approach (new Figure 3c), as well as tables listing the genes retained after each step of the selection (new Tables S17 and S26) and these additions improve the clarity of our manuscript.

      Reviewer #1 (Significance):

      In its current form this is a very limited study that would require significant additional work to move conclusions beyond correlation and hypothesis generation.<br /> Overall, while limited largely to target prioritisation, this research nicely exemplifies how genes affected by the p53-DREAM pathway can be robustly identified, providing a potential resource for individuals working on this pathway or on abnormal haematopoiesis and brain abnormalities. These results are complementary to work previously published by Fischer et al, which has been referenced throughout the analysis (highlighting Target Gene Regulation Database p53 and DREAM target genes) and discussion.

      This paper will be of interest to researchers of blood/neurological diseases who can assess if these genes are dysregulated in their datasets, or those investigating the p53-DREAM pathway. This work represents a useful resource detailing genes affected by this pathway in these disease settings, however researchers of the p53-DREAM pathway may find this paper useful when planning an approach to identify and prioritise genes of interest.

      We thank the reviewer for considering that our study represents a useful resource for researchers working on the p53-DREAM pathway, abnormal haematopoiesis and brain abnormalities, because it was exactly the purpose of our work. As mentioned above, we think that a study bridging the gap between DREAM experts and bone marrow or microcephaly specialists should be particularly useful.

      We also agree with the referee that our approach could be used to identify DREAM targets relevant to other disease settings, and we now mentioned this clearly in the revised manuscript.

      While our results are complementary to work previously published by Fischer et al and included in the Target gene regulation database, in the revised manuscript we discuss the novelty of our results in more details, notably by performing additional analyses. For example, our method identified bipartite DREAM binding sites for 151 candidate DREAM targets (of which 56 genes were not previously mentioned by Fischer et al.) and we now provide a detailed mapping (using 50 bp windows) of the bipartite DREAM binding sites we identified relative to ChIP peaks for DREAM subunits, then performed a similar mapping of the E2F and CHR sites included in the Target gene regulation database. Our predicted DREAM binding sites coincided with ChIP peaks more frequently (Figure 3a) than the predicted E2F or CHR from the Target gene regulation database (Figure S11), which further indicates the usefulness of our study as a resource.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The authors used various systems including Hoxa9-indubible BMCs, human and mouse cells, WT and p53 knockout MEF, glioblastoma cells to screen p53-DREAM targets and observed distinct finding for each system. Since different cell types have various p53 activation and p53 target genes expression, the authors might want to select proper cell type(s) to screen p53-DREAM target genes and design experiments to confirm that these genes are really p53-DREAM target genes.

      We agree that additional data from relevant cells or tissues were required to strengthen our conclusions. As mentioned in response to referee #1, in the revised manuscript we evaluated the relevance of candidate target genes related to blood ontology terms by integrating an additional screening step in our method, corresponding to the analysis of RNAseq dataset GSE171697, with data from hematopoietic stem cells of unirradiated or irradiated WT mice and unirradiated p53 KO mice , as well as RNAseq dataset GSE204924, with data from splenic cells of irradiated p53D24/- or p53+/- mice. As for genes related to brain development, we included the analysis of RNAseq datasets GSE78711 and GSE80434 for validation, two datasets from human cortical neural progenitors infected by the Zika virus or mock-infected. Together, the 4 datasets provide evidence for a p53-dependent downregulation in blood- and brain- relevant settings (new Tables S19 and S28).

      Importantly, in the revision we also compared our list of 151 genes appearing as the best p53-DREAM candidates with the results of Magès et al., who analyzed, in murine cells with a CRISPR-mediated KO of Lin37 (a subunit of DREAM), the transcriptomic changes that follow a reintroduction of Lin37. This comparison is detailed in the discussion section, with the new Figure S10 and Table S35. We mention: “Our list of 151 genes overlaps only partially with the list of candidate DREAM targets obtained with this approach, with 51/151 genes reported to be downregulated in Lin37-rescued cells [17]. To better evaluate the reasons for this partial overlap, we extracted the RNAseq data from Lin37 KO and Lin37-rescued cells and focused on the 151 genes in our list. For the 51 genes that Mages et al. reported as downregulated in Lin37-rescued cells, an average downregulation of 14.8-fold was observed (Figure S10, Table S35). Furthermore, when each gene was tested individually, a downregulation was observed in all cases, statistically significant for 47 genes, and with a P value between 0.05 and 0.08 for the remnant 4 genes (Table S35). By contrast, for the 100 genes not previously reported to be downregulated in Lin37-rescued cells, an average downregulation of 4.7-fold was observed (Figure S10, Table S35), and each gene appeared downregulated, but this downregulation was statistically significant for only 35/100 genes, and P values between 0.05 and 0.08 were found for 23/100 other genes (Table S35). These comparisons suggest that, for the additional 100 genes, a more subtle decrease in expression, together with experimental variations, might have prevented the report of their DREAM-mediated regulation in Lin37-rescued cells.”

      This comparison provides additional evidence that the 151 candidate target genes we identified are bona fide DREAM targets.

      Specific comments:<br /> The authors need to describe and define HSC and Diff in Figure 1.

      This has been corrected in the revised manuscript. “HSC” was replaced by “Hematopoietic Stem / Progenitor cells (+OHT)” and “Diff” was replaced by “Differentiated cells (5 days – OHT).

      Are Figure 1B and 1D list genes p53 targets in bone marrow cells?

      In the revised manuscript, we now analyzed RNAseq data to address this point. The question refers to lists of telomere-related genes (Figure 1b in both versions of the manuscript) and Fanconi-related genes (Figure 1d in the previous version, now Figure S2a), but could also apply to other lists of genes related to blood ontology terms (Figures S3-S5 in the revised manuscript). As mentioned in response to referee #1, in the revised manuscript we integrated an additional screening step in our method, corresponding to the analysis of RNAseq datasets specific of blood cells. We analyzed dataset GSE171697, with RNAseq data from hematopoietic stem cells of unirradiated WT or p53 KO mice, or irradiated WT mice, as well as dataset GSE204924, with RNAseq data from splenic cells of irradiated p53D24/- or p53+/- mice. The latter dataset appeared interesting because p53D24 is a mouse model prone to bone marrow failure and the spleen is a hematopoietic organ in mice. Furthermore, we also verified that the expression of a subset of blood-related candidate genes was decreased in the bone marrow cells of p53Δ31/Δ31 mice (prone to bone marrow failure) compared to bone marrow cells from WT mice, a result presented in the new Figure S8.

      Where is the detailed information for mouse and human cells in Figure 1 and Figure 2?

      In the first draft of the manuscript, supplementary tables provided precise values for ChIP binding. In the revised manuscript, we also provide the precise values for gene expression after bone marrow cell differentiation, as well as p53 regulation scores from the Target gene regulation databases. This additional information is included in the new Tables S1, S5, S8, S11, S14, S20 and S23.

      Are Figure 3B list genes also p53 target genes in other cell types such as bone marrow cells and glioblastoma?

      For genes in the Figure 3B of the previous version of the manuscript (now Figure 2B in the revised version), we now provide evidence that the blood-related genes are less expressed in the bone marrow cells of p53Δ31/Δ31 mice (mice with increased p53 activity and prone to bone marrow failure) compared to bone marrow cells from WT mice. This result is presented in the new Figure S8. For the brain-related genes of the same Figure, evidence of their p53-mediated regulation is provided by the RNAseq datasets GSE78711 and GSE80434, from human cortical neural progenitors infected by the Zika virus or mock-infected (analyzed in the new Table S28). Evidence of that a decreased p53 activity in glioblastomas correlates with increased expression of the brain-related genes of the same Figure is provided in supplementary Table S37.

      Does BRD8high has high p53 and p21?

      We now clearly show, in both Figure 4a and Table S37, that glioblastoma cells with high BRD8 exhibit a decreased expression of CDKN1A/p21 and other genes known to be transactivated by p53 (BAX, GADD45A, MDM2, PLK3), consistent with the fact that BRD8 attenuates p53 activity.

      Are genes listed in Figure 4B all p53 target genes? can some validation be done?

      For genes in Figure 4B, in the revision we focused on the genes that appeared more relevant, i.e. the 77 genes mutated in diseases with microcephaly or cerebellar hypoplasia. All the genes in Figure 4B are repressed in neural progenitors upon infection by the Zika virus, a virus known to cause p53 activation in those cells. This is reported in the new Table S28.

      Reviewer #2 (Significance):

      This is a potentially interesting study. The major limitation is the absence of validation from the screening. This study would definitely benefit the research community as long as some of the key findings are validated.

      We thank the referee for this comment. We hope the new evidence in this revision provide the validation requested by the referee.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In their work submitted to Review Commons, Rakotopare et al. aim to identify p53-DREAM target genes associated with blood or brain abnormalities. To this end, they utilize published data generated with a cellular model that results in cell-cycle exit and differentiation of murine bone marrow progenitor cells upon inducible expression of Hoxa9. By analyzing this gene expression data set published by Muntean et al., they find that multiple of the 3631 genes which are downregulated more than 1.5-fold in differentiated BMCs are also mutated in several disorders connected to proliferation and differentiation defects during hematopoiesis and brain development. By screening ChIP-seq data sets available at ChIP-Atlas, they find that the promoters of many of these genes are bound by DREAM complex components, and most of them were identified as genes indirectly repressed by p53 before (Fischer et al. 2016, targetgenereg.org). They then use a computational approach to identify putative CDE/CHR DREAM-binding sites in the promoters of 372 genes associated with blood/brain abnormalities which are downregulated in differentiated BMCs and bound by DREAM components. Out of the 173 candidate genes, they select twelve to analyze whether mutation of the putative DREAM binding sites results in increased activity of the promoters in luciferase reporter assays. The authors conclude that their findings suggest a general role for the p53-DREAM pathway in regulating hematopoiesis and brain development.<br /> While the study supports a large body of publications proving that repression of cell cycle genes by the DREAM complex is crucial for cell cycle arrest and exit, it is noted that none of the main conclusions here are unexpected or particularly exciting. All the analyses are based on data sets that compare gene expression in highly proliferative cells with cells that underwent terminal cell cycle exit. Thus, a large portion of the genes that are downregulated in differentiated BMCs are cell cycle genes and well-established targets of DREAM and E2F:RB complexes. Furthermore, it is not surprising that some of these pro-proliferative genes are mutated in diseases connected to proliferation defects like anemias or microcephaly.

      We agree with the referee that the DREAM complex is well known to regulate cell cycle genes – in fact, this is what we mention in the first sentence of our introduction in both versions of our manuscript. However, as we already pointed out in response to Referee #1, many scientists or clinicians specialized in bone marrow failure syndromes or microcephaly diseases are not familiarized with the p53-DREAM pathway, and we think our study will be particularly useful to them. Furthermore, our strategy relying on disease-based ontology terms rather than cell cycle regulation led to identify many DREAM targets that were not reported in previous studies, and our positional frequency matrices led to identify DREAM binding sites not predicted by previous approaches. As discussed below, our revised manuscript provides a more detailed comparison of our findings with those from previous studies.

      Additionally, I am not very enthusiastic about this manuscript because of several major concerns:

      1. The authors draw conclusions about the p53-DREAM pathway based on data that was generated in a cellular differentiation model without convincingly showing that p53 plays a central role in gene repression in this experimental setup.<br /> (A) Rakotopare et al. define p53-DREAM target genes based on RNA expression data from proliferating precursor cells and non-proliferating, differentiated BMCs (Muntean et al., 2010). This paper has not studied whether p53 gets activated in the particular experimental setup during Hox9a-induced BMC differentiation. On page 4 of their manuscript, the authors state: "Consistent with the fact that BMC differentiation strongly correlates with p53 activation..." without citing any literature or explaining why this is supposed to be a fact. Furthermore, they imply that cell cycle gene repression in this model system depends on p53 because mRNA expression of the p53 targets p21 and Mdm2 was found to be increased in the differentiated cells (Fig. 1A, 5-fold and 2-fold, respectively). However, defining a large set of "p53-DREAM target genes" based on the moderate increase in mRNA levels of two genes that are known to be activated by p53 without showing any evidence that p53 is even involved in this effect during BMC differentiation is not appropriate.

      We agree that Muntean et al. did not study whether p53 gets activated when BMCs differentiate in the Hox9a-ER system. We previously mentioned: “We observed that p53 activation correlated with cell differentiation in this system, because genes known to be transactivated by p53 (e.g. Cdkn1a, Mdm2) were induced, whereas genes repressed by p53 (e.g. Rtel1, Fancd2) were downregulated after tamoxifen withdrawal (Figure 1a)”. We had provided examples for 2 genes transactivated and 2 genes repressed, but clearly mentioned that they were given as examples. In the revised manuscript, we provide additional evidence with a new supplementary Figure that includes changes in expression for 15 additional genes known to be transactivated by p53, and 5 additional genes known to be repressed by p53 (Figure S1). In total, we now correlate HSC differentiation with p53 activation based on the expression of 24 well-known p53-regulated genes, which we hope is more convincing.

      In addition, we changed our phrasing and mention “Consistent with the notion that BMC differentiation strongly correlates with p53 activation in this system, 72 of these 76 genes have negative score(s) in the Target gene regulation (TGR) database”.

      (B) Interestingly, p53 is among the genes that get repressed on mRNA level in differentiated BMCs (Fig. 1B; Trp53), and the authors also identify the DREAM components E2F4 and LIN9 as bound to the p53 promoter by screening ChIP-Atlas data (Fig. 1C). Given that p53 has never been described as a DREAM target, I find this rather surprising and it makes me wonder whether appropriate parameters were selected for analyzing the ChIP data, particularly since the authors do not provide binding data for sets of non-cell cycle genes as a negative control.

      We retrieved ChIP data from the ChIP Atlas database without any specific parameters, thus in a completely unbiased manner. Importantly however, for reasons detailed in the manuscript, we clearly mentioned that total ChIP scores <979/4000 were considered too low to reflect significant DREAM binding. The ChIP score for Trp53 was 630, which rapidly led us to eliminate this gene from our screen.

      This ChIP score criterion was already mentioned in the previous version of our manuscript, but we think the addition of a Venn-like diagram (Figure 3c) and summary tables (S17 and S26) in the revised manuscript will probably make it easier to understand.

      (C) Finally, the authors utilize the targetgenereg.org database to show that many of the genes they describe as p53-repressed were already identified as p53 targets. This database (Fischer et al. 2016) was created by performing a meta-analysis integrating a plethora of RNA-seq and ChIP-seq datasets with the aim to identify whether a particular gene gets up- or downregulated by p53, shows cell-cycle-dependent expression, is a DREAM/MuvB or E2F:RB target, etc. For example, 57 datasets analyzing p53-dependent RNA expression in human and 15 datasets generated with mouse cells were included, and a positive or negative score shows in how many of these experiments the gene was found to be up (positive score) or downregulated (negative score). Combining a large number of datasets in such a study is very helpful to get an idea if a gene is indeed generally regulated by a transcription factor, or if it just showed up in a few experiments - either as a false positive or because the regulation depends on a particular biological setting. The authors find most of the genes they identify as repressed in differentiated BMCs also as downregulated by p53 in targetgenereg.org, however, it remains unclear what parameters they used to define a gene as p53-repressed. For example, in the caption of Fig. 1C, they state: "According to the Target gene regulation database, 72/76 genes are downregulated upon mouse and/or human p53 activation." The four exemptions are SLX1B (human score: 0, mouse score : na), PML (+41, +9), RAD50 (0, na), and TNKS2 (+17, +4). However, there are several other genes that do not appear to be generally repressed by p53, e.g. HMBOX1 (+4, -2); UPF1 (+1, -2), SMG6 (+18, -2), CTC1 (-5, +11), etc. Thus, without providing details regarding the parameters they use to define p53-target genes, such statements are rather misleading. An easy way to solve this problem would be to show the p53 scores in the tables together with the E2F4/LIN9 ChIP data.

      All the genes mentioned as downregulated by p53 had a negative TGR score in human and/or mouse cells. In the revised manuscript, we mention clearly what a negative TGR score means, by stating: “Consistent with the notion that BMC differentiation strongly correlates with p53 activation in this system, 72 of these 76 genes have negative p53 expression score(s) in the Target gene regulation (TGR) database [23], which indicates that they were downregulated upon p53 activation in most experiments carried out in mouse and/or human cells (Figure 1b, Table S1).” We agree with the referee that adding precise TGR scores is informative. In the revised manuscript, we provide the TGR scores for all the genes analyzed, as part of the new supplementary Tables S1, S5, S8, S11, S14, S20 and S23, together with their expression levels in undifferentiated or differentiated cells (as requested by Referee #2). The ChIP data are provided in separate tables (Tables S2, S3, S6, S7, S9, S10, S12, S13, S15, S16, S21, S22, S24 and S25).

      1. The authors define a large set of genes containing "CDE-CHR" promoter elements and thereby ignore how these elements are defined and what properties they have.<br /> (A) At the beginning of the introduction, the authors state: "The DREAM complex typically represses the transcription of genes whose promoter contain a bipartite CDE/CHR binding site, with a cell cycle-dependent element (CDE) bound by E2F4 or E2F5, and a cell cycle gene homology region (CHR) bound by LIN54, the DNA binding subunit of MuvB (Zwicker et al., 1995; Müller and Engeland, 2010)."<br /> This statement is incorrect. The authors ignore that the CDE/CHR tandem site is just one of four promoter elements that have been shown to recruit DREAM for the transcriptional repression of several hundred genes. It has been studied in detail that DREAM can bind to the following promoter sites:<br /> (I) CHR elements - bound by DREAM via LIN54; also bound by the activator MuvB complexes B-MYB-MuvB and FOXM1-MuvB which results in maximum gene expression in G2/M<br /> (II) CDE-CHR tandem elements - like (I) but binding of DREAM can be stabilized via E2F4/DP interacting with a truncated E2F binding site. Since CDE elements do not represent functional E2F sites, E2F:RB complexes do not bind.<br /> (III) E2F binding sites - bound by DREAM via E2F4/DP; also bound by E2F:RB complexes and activator E2Fs which results in maximum gene expression in G1/S<br /> (IV) E2F-CLE tandem elements - like (III) but binding of DREAM can be stabilized via LIN54 interacting with a non-canonical CHR-like element. Since CLE elements do not represent functional CHR sites, B-MYB-MuvB and FOXM1-MuvB do not bind.<br /> Thus, these promoter sites have different functions and can be clearly distinguished from each other based on their properties - a fact that is completely ignored by the authors. Since the authors do not differentiate between G1/S and G2/M expressed genes and (CDE)-CHR and E2F-(CLE) sites, they identify CDE-CHR elements in G1/S genes that are functional E2F-(CLE) sites. A good example of this is the Rad51ap1 gene (and also the Rad51 gene that the Toledo lab described before as a CDE-CHR gene (Jaber et al. 2016)): these genes get expressed in G1/S and the promoters contain highly conserved E2F sites (parts of which the authors define as CDEs), and CLEs (which the authors define as CHRs). Furthermore, E2F:RB complexes bind to the promoters. Again: even though (CDE)-CHR and E2F-(CLE) sites both bind DREAM, they are otherwise functionally different in their ability to recruit non-DREAM complexes.

      We agree that in the previous version of our manuscript we should have presented in more details the different types of DREAM binding sites and have corrected this in the revised manuscript. We now mention in the introduction that “The DREAM complex was initially reported to repress the transcription of genes whose promoter sequences contain a bipartite binding motif called CDE/CHR [19,20] (or E2F/CHR [21]), with a GC-rich cell cycle dependent element (CDE) that may be bound by E2F4 or E2F5, and an AT-rich cell cycle gene homology region (CHR) that may be bound by LIN54, the DNA-binding subunit of MuvB [19,20]. Later studies indicated that DREAM may also bind promoters with a single E2F binding site, a single CHR element, or a bipartite E2F/CHR-like element (CLE), and concluded that E2F and CHR elements are required for the regulation of G1/S and G2/M cell cycle genes, respectively [14,22].”

      We hope that the referee will agree with this complete yet concise way of presenting DREAM binding sites. Importantly, we agree that CDE/CHR and E2F/CLE are sites bound by different non-DREAM complexes, but both sites are bound by DREAM, so it makes perfect sense to use them together to define positional frequency matrices for DREAM binding predictions. We would also like to point out that terms used to define DREAM binding sites may vary in the literature. For example, to our knowledge Müller et al. were the first to propose a clear distinction between “CDE/CHR” and “E2F/CLE” sites (Müller et al. (2017) Oncotarget 8, 97737-97748), yet Müller recently co-authored a review in which these two distinct terms were not used, but were replaced by a single, apparently more generic term of “E2F/CHR” (Fischer et al., (2022) Trends Biochem. Sci. 47, 1009-1022). In the revised manuscript we now clearly mention that we designed our positional frequency matrices to search for “bipartite DREAM binding sites”, i.e. sites that might be referred to as CDE/CHR, E2F/CLE or E2F/CHR sites in various publications.

      (B) The authors identified putative CDE-CHR in the promoters of genes by building two position weight matrices (PWMs) based on 10 or 22 "validated CDE-CHR elements". However, since they include several genes that are clearly expressed in G1/S and contain E2F-(CLE) sites (e.g. Mybl2/B-myb, Rad51, Fanca, Fen1), it is not surprising that they identify a lot of putative CDE-CHR sites in genes that do not contain such elements.

      As discussed above, both CDE/CHR and E2F/CLE are bipartite DREAM binding sites, and we now clearly state that we used bipartite DREAM binding sites to generate our positional frequency matrices and predict DREAM binding.

      (C) Finally, in the discussion, the authors state: "A recent update (2.0) of the Target gene regulation database of p53 and cell cycle genes (www.targetgenereg.org) was recently reported to include putative DREAM binding sites for human genes (Fischer et al., 2022). However, this update only suggests potential E2F or CHR binding sites independently, a feature of little help to identify CDE/CHR elements. For example, targetgenereg 2.0 suggests several potential E2F sites, but no CHR site close to the transcription start site of FANCD2, despite the fact that we previously identified a functionally CDE/CHR element near the transcription start site of this gene (Jaber et al., 2016)." This statement highlights again that the authors don't seem to be aware of what specific properties distinct DREAM binding sites have, and that analyzing promoters for CHR and E2F sites separately generates much more meaningful results than the approach they chose. Also, the FANCD2 promoter binds DREAM as well as E2F:RB complexes and contains a highly conserved E2F binding site - which Jaber et al. mutated together with a potential downstream CLE element and named it "CDE/CHR".

      In the revised manuscript, we provide a more detailed comparison between the bipartite DREAM binding sites predicted with our positional frequency matrices for 151 genes and the separate E2F and CHR predicted sites reported in the Target gene regulation database for the same set of genes. We now mention: “The Target gene regulation (TGR) database of p53 and cell-cycle genes was reported to include putative DREAM binding sites for human genes, based on separate genome-wide searches for 7 bp-long E2F or 5 bp-long CHR motifs [23]. We analyzed the predictions of the TGR database for the 151 genes for which we had found putative bipartite DBS. A total of 342 E2F binding sites were reported at the promoters of these genes, but only 64 CHR motifs. The similarities between the predicted E2F or CHR sites from the TGR database and our predicted bipartite DBS appeared rather limited: only 14/342 E2F sites overlapped at least partially with the GC-rich motif of our bipartite DBS, while 27/64 CHR motifs from the TGR database exhibited a partial overlap with the AT-rich motif. Importantly, most E2F and CHR sites from the TGR database mapped close to E2F4 and LIN9 ChIP peaks, but only 16% of E2Fs (54/342), and 33% of CHRs (21/64) mapped precisely at the level of these peaks (Figure S11), compared to 55% (83/151) of our bipartite DBS (Figure 3a). Thus, at least for genes with bipartite DREAM binding sites, our method relying on PFM22 appeared to provide more reliable predictions of DREAM binding than the E2F and CHR sites reported separately in the TGR database. Importantly however, predictions of the TGR database may include genes regulated by a single E2F or a single CHR that would most likely remain undetected with PFM22, suggesting that both approaches provide complementary results.”

      1. The experimental approach chosen to validate CDE-CHR elements in a set of twelve promoters by luciferase reporter assays is not adequate.<br /> (A) Since the authors introduce point mutations in putative CDE and CHR elements in parallel, it is impossible to identify functional CDE elements. As explained above, a functional CDE is not required for binding of MuvB complexes and gene repression, and mutating the CHR alone would already lead to a loss of DREAM binding and to de-repression of a promoter. Thus, without mutating both sites of CDE-CHR elements separately, it is impossible to provide evidence that a putative CDE is functional.<br /> (B) As the putative CDE-CHR elements identified by the authors with a computational approach can overlap with functional E2F-(CLE) elements, the authors inactivate such sites by introducing mutations which leads to loss of DREAM binding and upregulation of the promoters, however, because of the problems described above, this experimental approach in the best case identifies DREAM binding sites, but does not differentiate between (CDE)-CHR and E2F-(CLE) elements.

      Yes, we agree with this comment. As discussed above, our goal was to identify DREAM-binding sites, not to differentiate between CDE/CHR and E2F/CLE elements. In other words, we wanted to identify genes regulated by p53 and DREAM, but not distinguish between genes regulated by p53, DREAM and E2F/Rb versus those regulated by p53, DREAM and BMyb-MuvB or FoxM1-MuvB.

      (C) The authors analyze the activities of wild-type and mutant promoters in proliferating NIH3T3 cells. Since the mutated promoters showed increased activity (about 2-3 fold), which would be expected when binding of DREAM gets abolished, they conclude: "...these experiments indicated that we could identify functional CDE/CHRs for 12/12 tested genes." In addition to the problems described above, a slight upregulation of promoter activities caused by the introduction of multiple point mutations close to the TSS is not sufficient to verify these elements. The increase in activity could occur independent of DREAM-binding by unrelated mechanisms. The authors should at least analyze the activities of the promoters with and without induction of p53. A loss of p53-dependent repression of the mutated promoters would prove that the elements are essential for p53-dependent repression. Furthermore, there are several experimental approaches to analyze whether DREAM binds to the putative promoter element and whether the introduced mutations disrupt binding (ChIP, DNA affinity purification, etc.).

      In the revised manuscript, we show that the promoters of 7 of the tested genes, when cloned in luciferase reporter plasmids and transfected into NIH3T3 cells, exhibited a significant (> 1.4 fold) repression upon p53 activation by cell treatment with Nutlin, the Mdm2 antagonist. For these promoters, we showed that the p53-dependent repression was abrogated by mutating the identified DREAM binding site, which provided direct evidence that our positional frequency matrices can identify functionally relevant DREAM binding sites essential for p53-mediated repression. These experiments were added in Figures 2e and 2i.

      Furthermore, as previously mentioned in response to referee #1, in the revised manuscript we precisely mapped the predicted DREAM binding sites for 151 genes in 50 bp windows within regions bound by E2F4 and/or LIN9, an analysis included in new Figure 3a. The distribution of these peaks clearly indicates that most predicted DREAM binding sites map precisely within a 50 bp-window encompassing the ChIP peaks, which represents an enrichment of at least a 1300-fold compared to the rest of the genome. This mapping strongly suggests that our predicted DREAM binding sites are functionally relevant.

      Importantly, as shown in the new Figure S11, we carried out a similar mapping of the predicted E2F and CHR sites reported in the Target gene regulation (TGR) database and found that our predicted DREAM binding sites co-mapped with E2F4/LIN9 ChIP peaks more frequently than the E2F and CHR sites of the TGR database, which supports the conclusion that our positional frequency matrices bring new and improved predictions for DREAM binding.

      1. Taken together, while over-simplifying mechanisms of cell cycle gene regulation, the authors largely ignore recent findings and publications regarding gene regulation by p53, E2F:RB, and DREAM/MuvB complexes:<br /> (A) Publications that show how DREAM binds to (CDE)-CHR sites and that experimentally defined a consensus motif for CHR elements (e.g. PMID: 27465258, PMID: 25106871).<br /> (B) Publications that identify p53-DREAM target genes by activating p53 in cells with or without functional DREAM complex (e.g. PMID: 31667499, PMID: 31400114).<br /> (C) Identification and comparison of (CDE)-CHR and E2F-(CLE) DREAM binding sites that have distinct functions in the activation of cell-cycle expression in G1/S and G2/M (e.g. PMID: 29228647, PMID: 25106871).<br /> These findings have been summarized in several review articles (e.g. PMID: 29125603, PMID: 28799433, PMID: 35835684). All of them describe the mechanisms I have mentioned above in detail, and since Rakotopare et al. cite one of the papers (Engeland 2018), I wonder even more why they did not design their experiments based on current knowledge.

      The points (A) and (C) of this comment were largely discussed in our response to points 2 and 3 of the same referee. Briefly, in the revised manuscript we clearly mention CDE/CHR, E2F/CLE and E2F/CHR sites, as well as the functional differences between E2F and CHR sites with regards to cell cycle regulation, but all these sites were considered together in our positional frequency matrices because our goal was to identify genes regulated by p53 and DREAM, not to distinguish between genes regulated by p53, DREAM and E2F/Rb versus those regulated by p53, DREAM and BMyb-MuvB or FoxM1-MuvB.

      Regarding point (B) of this comment, in the revised manuscript we performed a detailed comparison of our results with those of Mages et al. who analyzed, in murine cells with a CRISPR-mediated KO of Lin37 (a subunit of DREAM), the transcriptomic changes that follow a reintroduction of Lin37 (Mages et al. (2017) elife 6, e26876). This comparison is detailed in the discussion section, with New Figure S10 and Table S35. As mentioned in response to referee #2, this comparison is perfectly consistent with DREAM regulating the 151 genes for which we identified DREAM binding sites.

      Minor concerns:

      1. The authors state: "Importantly however, the relative importance of the p53-p21-DREAM pathway (called below p53-DREAM) remains controversial, because multiple mechanisms were proposed to account for p53-mediated gene repression (Peuget and Selivanova, 2021)." Even though Peuget & Selivanova do not agree that genes get repressed in response to p53 activation exclusively by the p21-DREAM pathway, they do not question that this mechanism is essential for the p53-dependent repression of a core set of cell cycle genes. Since I am also not aware of any publications that challenge the importance of the p53-p21-DREAM pathway, I do not agree with this statement.

      As the referee pointed out, in the first version of the manuscript we wrote that “the relative importance of the p53-p21-DREAM pathway (called below p53-DREAM) remains controversial, because multiple mechanisms were proposed to account for p53-mediated gene repression (Peuget and Selivanova, 2021)”. The term “relative” was crucial in this sentence, because we wanted to say that the relative proportion of genes regulated by DREAM remained controversial. It seems to us that the title of the review by Peuget & Selivanova (“p53-dependent repression: DREAM or reality?”) emphasizes this controversy. Nevertheless, in the revised manuscript, we now mention : “The relative importance of this pathway remains to be fully appreciated, because multiple mechanisms were proposed to account for p53-mediated gene repression [18]”. We hope the referee will find this phrasing more acceptable.

      1. Some parts of the manuscript are tiring to read - for example, pages 6, 7, and 8 which contain long listings and numbers of genes that are downregulated in differentiated BMC, found to be mutated in various disorders, bind DREAM components, were identified as downregulated by p53, etc. The authors may consider combining central parts of these data in a table that they show in the main manuscript which would make it easier to digest the information and at the same time significantly shorten the manuscript.

      We apologize if some parts of the article were tiring to read. We hope that the addition of Tables S17 and S26, as well as the Venn-like diagram in Figure 3c, will improve the reading of the manuscript.

      1. The supplementary tables (S1-S26) are combined in one Excel file with multiple tabs. The authors should label the tabs accordingly to make it easier for the reader to find a particular table.

      We labelled the Excel tabs in the revised manuscript, as suggested.

      1. At the end of page 6, the authors show that 17 genes found to be downregulated in differentiated BMCs are mutated in multiple bone marrow disorders, however, since they don't include references, it remains unclear where these mutations were originally described.

      In the revised manuscript, we included a supplementary table (Table S36) with appropriate references for blood and/or brain related phenotypes for the 106 genes associated with blood or brain abnormalities.

      1. On page 9, the authors state: "As a prerequisite to luciferase assays, we first verified that the expression of these genes, as well as their p53-mediated repression, can be observedin mouse embryonic fibroblasts (MEFs), because luciferase assays rely on transfections into MEFs (Figure 3b)." The authors don't explain why luciferase assays rely on transfections into MEFs and based on the caption of Fig. 3C, the luciferase assays were not performed in MEFs, but in NIH3T3 cells: "WT or mutant luciferase reporter plasmids were transfected into NIH3T3 cells..."

      According to the American Type Culture Collection (ATCC), the NIH3T3 cell line is a mouse embryonic fibroblastic (MEF) cell line, which explains why we had tested the expressions of candidate target genes in MEFs. However, as we now clearly mention in the manuscript, this cell line exhibits an attenuated p53 pathway, which improves cell survival after transfection but leads to decreased p53-mediated repression. These points are now clearly mentioned in the text and in a new supplemental Figure (Figure S9).

      Reviewer #3 (Significance):

      While the study supports a large body of publications proving that repression of cell cycle genes by the DREAM complex is crucial for cell cycle arrest and exit, it is noted that none of the main conclusions here are unexpected or particularly exciting. All the analyses are based on data sets that compare gene expression in highly proliferative cells with cells that underwent terminal cell cycle exit. Thus, a large portion of the genes that are downregulated in differentiated BMCs are cell cycle genes and well-established targets of DREAM and E2F:RB complexes. Furthermore, it is not surprising that some of these pro-proliferative genes are mutated in diseases connected to proliferation defects like anemias or microcephaly.

      Again, we agree with the referee that the DREAM complex is well known to regulate cell cycle genes, but many scientists or clinicians specialized in bone marrow failure syndromes or microcephaly diseases are not familiarized with the p53-DREAM pathway, and we think our study will be particularly useful to them. As for DREAM specialists, our strategy relying on disease-based ontology terms rather than cell cycle regulation led to identify many DREAM targets that were not reported in previous studies, and our positional frequency matrices led to identify DREAM binding sites not predicted by previous approaches. We hope that, by considering all these points together, the referee will acknowledge that our study provides a valuable resource for different types of readerships.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      1) It is interesting MxDnaK1 seems to prefer cytosolic proteins while Mx-DnaK2 prefers inner membrane proteins. The domain-swapping experiments seem to suggest that the NBD is important for this difference. How NBD is important is not addressed. Is it due to ATP hydrolysis, NBD-SBD interaction, or co-chaperone interactions?

      Answer: Thanks for your comments. We speculate that the co-chaperone interaction might be the key factor contributing to substrate differences. According to the working principle of Hsp70, its functional diversity is largely determined by substrate differences. Co-chaperones, such as JDPs, play a crucial role in this process as they possess the ability to bind substrates and facilitate their targeted delivery. Therefore, much of the functional diversity of the HSP70s is driven by a diverse class of JDPs 1,2. We found that NBD played important roles in cochaperone recognition of MxDnaKs. Additionally, it is generally accepted that the efficiency of ATP hydrolysis does not significantly impact the substrate recognition of Hsp70. Furthermore, if the NBD-SBD interaction is crucial, the substitution of either the NBD or SBDβ domain might result in similar cell phenotypes, as both alterations disrupt the original NBD-SBDβ interaction. We believe the DnaK proteins and their cochaperones both determine the substrate spectrums. We made corresponding modifications in the revised manuscript. (Page22; Line 488-494 in the marked-up manuscript)

      2) About the interactome analysis, since apyrase was added to remove ATP, it's surprising multiple Hsp40s were found in their analysis. Hsp70-Hsp40 interaction is known to require ATP. This may suggest some of the proteins found in their interactome analysis are artifacts. The authors should perform negative controls for their interactome analysis, such as using a control antibody for their CO-IP and analyze any non-specific binding to their resin.

      In addition, since JDPs were pull-down, is it possible some of the substrates identified are actually substrates for JDPs, not binding directly to DnaKs?

      Answer: This is an interesting question. As you correctly noted, the interaction between Hsp70 and Hsp40 requires ATP. In our experiment, we used apyrase to remove ATP in order to promote tight binding of substrate by DnaK. This methodology was initially described by Calloni, G. et al in 20123, and the authors also identified the co-chaperone protein DnaJ, but with a concentration higher than 77% of the interactors. In our opinions, the incomplete removal of ATP could be the underlying cause of this phenomenon.

      We apologize for the undetailed description in Methods. Actually, we implemented negative controls for each MxDnaK in order to eliminate the potential non-specific interactions with Protein A/G beads or antibodies. Specifically, we conducted a CO-IP experiment without the presence of antibodies to assess any non-specific binding to the Protein A/G beads. To further investigate non-specific binding to the antibodies of MxDnaK2 and MxDnaK1, we utilized the mxdnak2-deleted mutant (strain YL2216) and the MxDnaK1 swapping strain with the MxDnaK2 SBDα (strain YL2204), respectively. As the SBDα of MxDnaK1 was employed as antigen to generate antibodies, and YL2204 can’t be recognized by anti-MxDnaK1 (Figure S5). We believe these controls allowed us to evaluate and exclude the non-specific interactions in our CO-IP. We have improved our description in methods. (Page 27; Line 596-607)

      While one of the main functions of JDPs is to interact with unfolded substrates and facilitate their delivery to Hsp70, there may still be substrates that do not directly bind to Hsp70. It’s thus possible that some of the substrates identified only bind to JDPs. We made corresponding modifications in the revised manuscript. (Page 14; Line 290-292)

      3) For Figure S7, the pull-down assay used His6-tagged JDPs. Ni resin is known to bind Hsp70s non-specifically. It's not surprising DnaK showed up in all the pull-down lanes, especially considering how much DnaK was over-expressed. For some pull-down lanes, the amount of DnaK is much more than that of JDPs, further indicating artifact. The author should include negative controls such as JDPs without His6-tag or any irrelevant protein with His6 tag.

      Answer: Thanks for your suggestion. As you and another reviewer pointed out, there were some flaws in the experimental design of the pulldown assay. These include the non-specific binding of Hsp70 proteins to nickel resin, the absence of a negative control without a tag, and the inappropriate selection of the MBP tag. Thus, we employed the nLuc assay as an alternative to the pulldown experiment to validate the interaction between DnaK and JDP (Figure S9). While our manuscript employed nLuc to confirm protein dimerization, it is worth noting that nLuc assay was originally devised for investigating protein interactions 4.

      4) For the proposed dimer formation in Fig. 4C, there are multiple bands above the monomer bands. What are these forms? It seems the majority of the Cys residues that could form disulfide bonds are in the NBD of MxDnaK2 since constructs with MxDnaK2-NBD form some sort of high-MW bands above the monomer. Does MxDnaK1-NBD also contain Cys at the analogous positions? The fact that MxDnaK1 didn't show disulfide-bonded bands doesn't mean it doesn't form dimer. It depends on where the Cys residues are.

      It's nice the authors did Fig. 4D. However, the authors should include a positive control to show how strong the signal is for a true interaction before interpreting their results.

      Answer: Thank you very much for your comments. In at least three independent experiments, we consistently observed two unidentified bands within the molecular weight range of 70-100 kDa during the purification process of His6-MxDnaK2. These bands appeared to be intermediate in size between the monomeric and dimeric forms of His6-MxDnaK2, and disappeared upon DTT treatment. the unidentified band compositions have been confirmed by LC/MS. The upper band included MxDnaK2 (65.3 kDa) and anti-FlhDC factor of E. coli (WP_001300634.1, 27 kDa). In the lower band, we detected the presence of MxDnaK2 and the 50S ribosomal protein L28 of E. coli (WP_000091955.1, 9 kDa). Based on these findings, we conclude that these two additional bands are the result of the interaction between His6-MxDnaK2 and these two E. coli proteins. The related explanations have been added in the legend of Figure 5. (Page 42; Line 938-942)

      We analyzed the presence of Cys in MxDnaK1 and MxDnaK2. The NBD region of MxDnaK2 contains two Cys, located at positions 15 and 319. MxDnaK1-NBD contain a Cys at position of 316, which is the analogous position of 319-Cys of MxDnaK2. The analogous position of 15-Cys of MxDnaK2 is a Val in MxDnaK1, which might be an important factor contributing to the inability of MxDnaK1 to form oligomers.

      Thanks for your suggestion to add the positive control. We re-performed the nLuc assays including a positive control(αSyn). According to the working principle of the nLuc assay, the amount of fluorescent substrate is limited. Therefore, even for proteins that interact with each other, the fluorescence value gradually decreases and reaches a plateau, similar to the negative control. This gradual decline in fluorescence is a significant indicator of protein interaction. In Figure 4D (Figure 5D in the revision version), we only presented the results of the first 20 minutes of detection. The complete two-hour detection results have been added in the supplementary figure (Figure S14).

      5) line 48: "human HSC70 and HSP70 are 85% identical, and the phenotypes of their knockout mutants are different, which is consistent with their largely nonoverlapping substrates" The authors completely ignored that the promoters of HSC70 and HSP70 are very different.

      Answer: This is our carelessness. Yes, HSC70 and HSP70 exhibit distinct expression patterns, which play important roles in their functional diversity. We modified the sentence in the new version (Page 5; Line 58)

      6) Line 69: "The two PRK00290 proteins, not the other Myxococcus Hsp70s, could alternatively compensate the functions of EcDnaK (DnaK of E. coli) for growth." Please add references for this statement.

      Answer: Added, thanks.

      7) line 191: What's the mechanism for DnaK's role in oxidative stress? Is the disulfide bond formation in Fig. 4 related? Does disulfide-bond change the activity of DnaK?

      Answer: Thanks for your pertinent comments. Honestly, we have no idea about the mechanism for MxDnaK2's role in oxidative stress. In our previous studies, we determined that the deletion of mxdnaK2 resulted in a longer lag phase after H2O2 treatment. Here, our aim was to investigate the impact of region swapping on the cellular function of MxDnaK2. In other bacteria, the critical role that DnaK plays in resistance to oxidative stress stems from the pleotropic functions of this chaperone. By shortening the dwelling time that proteins spend in the unfolded state, the DnaK/DnaJ chaperone system minimizes the risk of metal-catalyzed carbonylation of the side chains of proline, lysine, arginine, and threonine residues, but none of them linked to the dimerization characteristic of DnaK 5-7.

      8) Fig. S9 seems redundant.

      Answer: Deleted, thanks.

      9) line 263, "but the NBD exchange was almost equal to the deletion of the gene with respect to phenotypes." But, the mutant has >50% activity in Fig. 3F.

      Answer: We apologize for the confusing description. The “phenotypes” here indicates “cell phenotypes”. What we really tried to say with this sentence is that the NBD swapping strain of either MxDnaK1 or MxDnaK2 presented identical cell phenotypes with the gene-deleted strain. As we have already provided a detailed description of this result earlier, now we consider this sentence to be redundant and have therefore deleted it. (Page 17; Line 355-356)

      10) line 221-226: the logic is not quite clear.

      Answer: We apologize for the confusing description. In M. xanthus DK1622, MxDnaK1 is essential for cell survival, and an insertion of a second copy of mxdnaK1 in the genome is required for deletion of the in-situ gene. Thus, To verify whether the NBD region is required for the essentiality of MxDnaK1, we performed the region swapping of the in situ MxDnaK1 gene in the att::mxdnaK1 mutant (a DK1622 mutant containing a second copy of mxdnaK1 at attB site), and successfully obtained the MxDnaK1 mutant swapped with the MxDnaK2 NBD region. The experiment indicated that the NBD of MxDnaK1 is essential for the cellular functions of the chaperone. We have added the information and modified the sentences in the manuscript. (Page 15; Line 308-319)

      Minor concerns:

      Please check spelling. There are some typos such as "HPPES" in the Methods.

      Answer: Corrected. Many thanks.

      My areas of expertise are protein biochemistry, genetics, and structural biology on heat shock proteins.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Major comments:

      The manuscript is very nice and interesting, although some of the authors' conclusions are perhaps not well supported by their data. For example:

      1) In the pulldown experiments the lack of interaction between 2747-MxDnaK2, 3015-MxDnaK2 and 1145-MxDnaK1 should be shown in order to support the conclusion made in line 197-198,

      Answer: This is our carelessness. As you and another reviewer pointed out, there are some flaws in the experimental design of the pulldown assay. These include the non-specific binding of Hsp70 proteins to nickel resin, the absence of a negative control without a tag, and the inappropriate selection of the MBP tag. Thus, we employed the nLuc assay as an alternative to the pulldown experiment to validate the interaction between DnaK and JDP (including 2747-MxDnaK2, 3015-MxDnaK2 and 1145-MxDnaK1 interaction) (Figure S9). While our manuscript employed nLuc to confirm protein dimerization, it is worth noting that nLuc assay was originally devised for investigating protein interactions 4.

      2) The only evidence that the NBD of MxDnaK1 is essential for bacterial growth is that this mutation couldn´t be obtained in M. xanthus. However, it could be purified in E. coli. Could the authors do some experiments with the M. xanthus strain without the chromosomal MxDnaK1 and then introduce a plasmid with the mutated gene?

      Answer: We apologize for the confusing description. Actually, we determined the NBD is essential not only from the mutation couldn’t be obtained. In M. xanthus DK1622, MxDnaK1 is essential for cell survival, and in-situ deletion of the gene could be obtained after an insertion of a second copy of mxdnaK1 in the genome at the attB site. To verify whether the NBD region is required for the essentiality of MxDnaK1, we performed the region swapping of the in situ MxDnaK1 gene in the att::_mxdnaK_1 mutant (a DK1622 mutant containing a second copy of _mxdnaK_1), and successfully obtained the MxDnaK1 mutant swapped with the MxDnaK2 NBD region. The experiment indicated that the NBD of MxDnaK1 is essential for the cellular functions of the chaperone. We have added the information and modified the sentences in the manuscript. (Page 15; Line 308-319)

      3) All the experiments with purified proteins were done with MxDnaKs bearing His-tags. It doesn't say explicitly its position, but as they employed a pET28A it is likely that the tag is at the N-terminus, which is close to the linker region. As this tag might interfere, it should be removed for the experiments, or at least a control done with the tag removed.

      Answer: We apologize for the lack of detailed description. As you pointed out, the His-tags are located at the N-terminus of DnaKs. The full lengths of MxDnaK1 and MxDnaK2 are 638 and 607 amino acids. The linker regions are located at amino acid positions 381-386 for MxDnaK1 and 387-392 for MxDnaK2. Therefore, we believe that the His-tag is not close to the linker regions. We have included the information in new manuscript. (Page 24; Line 544-546)

      The purified His6-DnaK proteins were employed for holdase activity assays and in vitro dimerization assays. Several previous studies have utilized the same holdase activity assay method with His-tagged DnaK 8,9. We suggested that the His-tag did not interfere with the holdase activity of DnaK. To exclude the influence of His-tag on oligomerization, we conducted a control with the tag removed in the in vitro dimerization assay and the result show no difference (Figure S13).

      4) The authors state that MxDnaK dimerized in vitro with the NBD, and to disrupt the dimer they used 100 mM DTT, which is a very high concentration. As the protein has the His-tag, it should be removed to corroborate that it is not interfering with the dimerization.

      Answer: Thanks for your suggestion. As mentioned above, to exclude the influence of the His-tag on oligomerization, we conducted a control with the tag removed in the in vitro dimerization assay and the result show no difference (Figure S13).

      5) Why were the pulldown experiments done with MBP-MxDnaKs? Can you show a negative control between the MBP and the JDPs to rule out this interaction? It will be more suitable to do the pulldown assays with the purified MxDnaK´s without the His-tags (and the His-tags JDP that were employed).

      Answer: Thanks for your suggestion. As mentioned above, there are some flaws in the experimental design of the pulldown assay. Thus, we employed the nLuc assay as an alternative to the pulldown experiment to validate the interaction between MxDnaKs and JDPs (Figure S9).

      Minor comments:

      • E. coli´s DnaK is only essential in heat shock conditions and for lambda phage cycle. If MxDnaK1 is similar to this Hsp70, why the substitution of its NBD for the NBD MxDnaK2 would be lethal for bacterial growth?

      Answer: Thanks for the comments. As you correctly point out, DnaK is nonessential in E. coli. But in some other bacteria, DnaK also plays an essential role in cell growth for different reasons 10-12. In our previous studies, we determined that MxDnaK1 is essential in M. xanthus DK1622, and the MxDnaK2 is nonessential. In this study, we performed region swapping and found that only the NBD of MxDnaK1 was unreplaceable. In our opinions, the result indicated that NBD play important roles in the functional diversity between MxDnaK1 and MxDnaK2.

      • I think that the writing should be revised and in the supporting information the captions of the figures should include more information.

      Answer: Thanks a lot for the suggestion. We revised the manuscript and added more information in the legends of supplementary figures.

      Reviewer #2 (Significance):

      -General assessment: This is a nice piece of work which would benefit from revision to address the comments above. The authors showed the roles and differences between two DnaK in the same organism. They track these differences to the subdomains of the MxDnaK´s and co-chaperones. It will be interesting for future works to explore more deeply the co-chaperones and their interactions.

      -Advance: I think that this manuscript fills a gap regarding the role of DnaK duplicated in bacterial strains. -Audience: I would say that the audience is broad and includes scientists interested in protein folding and chaperones, as well as myxobacteria.

      1. Rosenzweig, R., Nillegoda, N. B., Mayer, M. P. & Bukau, B. The Hsp70 chaperone network. Nat Rev Mol Cell Biol 20, 665-680, doi:10.1038/s41580-019-0133-3 (2019).
      2. Kampinga, H. H. & Craig, E. A. The HSP70 chaperone machinery: J proteins as drivers of functional specificity. Nat Rev Mol Cell Biol 11, 579-592, doi:10.1038/nrm2941 (2010).
      3. Calloni, G. et al. DnaK functions as a central hub in the E. coli chaperone network. Cell Rep 1, 251-264, doi:10.1016/j.celrep.2011.12.007 (2012).
      4. Dixon, A. S. et al. NanoLuc Complementation Reporter Optimized for Accurate Measurement of Protein Interactions in Cells. ACS Chem Biol 11, 400-408, doi:10.1021/acschembio.5b00753 (2016).
      5. Fredriksson, A., Ballesteros, M., Dukan, S. & Nystrom, T. Defense against protein carbonylation by DnaK/DnaJ and proteases of the heat shock regulon. J Bacteriol 187, 4207-4213, doi:10.1128/JB.187.12.4207-4213.2005 (2005).
      6. Santra, M., Dill, K. A. & de Graff, A. M. R. How Do Chaperones Protect a Cell's Proteins from Oxidative Damage? Cell Syst 6, 743-751 e743, doi:10.1016/j.cels.2018.05.001 (2018).
      7. Fredriksson, A., Ballesteros, M., Dukan, S. & Nystrom, T. Induction of the heat shock regulon in response to increased mistranslation requires oxidative modification of the malformed proteins. Mol Microbiol 59, 350-359, doi:10.1111/j.1365-2958.2005.04947.x (2006).
      8. Chang, L., Thompson, A. D., Ung, P., Carlson, H. A. & Gestwicki, J. E. Mutagenesis reveals the complex relationships between ATPase rate and the chaperone activities of Escherichia coli heat shock protein 70 (Hsp70/DnaK). J Biol Chem 285, 21282-21291, doi:10.1074/jbc.M110.124149 (2010).
      9. Thompson, A. D., Bernard, S. M., Skiniotis, G. & Gestwicki, J. E. Visualization and functional analysis of the oligomeric states of Escherichia coli heat shock protein 70 (Hsp70/DnaK). Cell Stress Chaperones 17, 313-327, doi:10.1007/s12192-011-0307-1 (2012).
      10. Shonhai, A., Boshoff, A. & Blatch, G. L. The structural and functional diversity of Hsp70 proteins from Plasmodium falciparum. Protein Sci 16, 1803-1818, doi:10.1110/ps.072918107 (2007).
      11. Vermeersch, L. et al. On the duration of the microbial lag phase. Curr Genet 65, 721-727, doi:10.1007/s00294-019-00938-2 (2019).
      12. Burkholder, W. F. et al. Mutations in the C-terminal fragment of DnaK affecting peptide binding. Proc Natl Acad Sci U S A 93, 10632-10637, doi:10.1073/pnas.93.20.10632 (1996).
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Kellner and Berlin present their research findings pertaining to the effect of GRIN2B variants that modify NMDA receptor function and pharmacology. While these mutations were published previously, the new manuscript provides a more thorough investigation into the effects that these variants pose when incorporated into heteromeric complexes with either wildtype GluN2B or GluN2A - NMDA receptors containing only a single mutated GluN2B subunits is more relevant to the disease cases because the associated patients are heterozygous for the variant. The authors achieved selective expression of receptor heteromeric complexes by utilising an established trafficking control system. The authors found that while a single variant subunit in the receptor complex is largely dominant in its effect on reducing glutamate potency of the NMDA receptor, it 's effect on receptor pharmacology varied. Unlike diheteromeric receptors containing mutated subunits, polyamine spermine potentiated GluN1/2B (but not GluN1/2A/2B) receptors that contained a single mutated GluN2B. In contrast, the neurosteroid, pregnenolone-sulfate (PS), was effective at potentiating the NMDA receptor currents (to varying degrees) regardless of the subunit composition. The potentiation of NMDA receptor currents by PS was also observed in neurons overexpressing the variants.

      The techniques used in this study were appropriate to address the objectives and the overall effects are large, and generally convincing. I like the way the results are presented, although have a few (easily addressable) comments.

      We thank the reviewer for the positive remarks on our manuscript.

      Major comments:

      #1 When incrementally adding drugs (e.g. traces in figures 5 and 6), it doesn't always appear like the response has plateaued before changing the solutions/drugs. Therefore, I am curious to what extent the effects observed are underestimated.

      The reviewer is correct to note that some responses do not necessarily reach a plateau, despite our efforts reach steady-state (as shown in most figures, e.g., Figs. 1-4, 6b, etc.), in particular when applying pregnenolone-sulfate (PS) (Fig. 5a, all traces in middle and bottom rows). However, in several instances, this was unobtainable due the very slow effect of the neurosteroid (its mode of action is from within the membrane) and the very large size of the cell (~1 mm). For these reasons, these experiments mandated excessively long exposures (~minutes) of oocytes to glutamate and PS (see scale bar- 20 secs) to try to reach steady-state, however this also caused deterioration to some cells (which did not return to baseline- and were therefore discarded). Thus, we eventually converged on settings whereby we did not expose oocytes to more than 4 minutes of the drug. Nevertheless, to try to estimate the extent of the underestimation (if any), we fitted the currents (standard mono-exponential fit, as previously reported1–3 (Suppl. 5a). We found that our application times of PS were, on average, three time the response’s time constants (tau) (Suppl. 5b), and we found a very weak relationship (R2 = 0.09) between the response to PS and time of its application (Suppl. 5c). These are now explicitly mentioned in the text (line #203), and in the legend of Suppl. 5. These thereby suggest that the reaction reached approximately 95% (1 - 1/e^3) of the steady-state value, and we are therefore confident that we have very small, if any, underestimation the extent of PS potentiation.

      2 Also, in relation to figure 6, to what extent does agonist application cause desensitization here? Looking at traces in Figure 6b it appears that there is some desensitization and it isn’t clear to what extent this persists during the solution changes.

      Agonist desensitization of NMDARs-currents is a well-known phenomenon, but it is very well established that it is not always observed in cells, including neurons (e.g., 4–7). In general, we did not observe very frequent desensitization’s (we provide a larger variety of traces of desensitizing and non-desensitizing currents (Fig. 6b Suppl. 7e and Suppl. 8a). Nevertheless, we explicitly note that in neurons, currents that didn’t reach steady-state after application of 100 mM NMDA were excluded from analysis (Methods - Patch clamping of cultured neurons, line #474), and in most cases desensitization was minor (or absent) following application of 100 mM NMDA and 100 mM PS (Fig. 6b).

      3 Could the authors conduct/show the controls where NMDA alone (for 50-60s), or NMDA followed by PE-S (without ifenprodil).

      These recordings are now shown in Fig. 6b and Suppl. 8a, (as opposed to Suppl. 7e).

      #4 Finally, figure 5 shows the effect of the neurosteroid (and ifenprodil) on NMDA-evoked currents in neurons overexpressing the GluN2B variants in neurons. However, there currents probably reflect a mixture of extrasynaptic and synaptic receptors. To what extent are synaptic NMDA receptors affected by the variants?

      To show the extent of the effect of the variants over synaptic receptors, we recorded miniature NMDA-dependent EPSCs; mEPSCNMDA), as described in our previous report8. We find that the varinats completely eliminate the appearance of mEPSCs (Suppl. 7a, b). Change in minis’ frequency is not the result of a presynaptic change or a change in synapse number9, as we have shown that AMPAR-mEPSC frequency was unaffected by the variants (i.e., synapse number and probability of presynaptic release are unchanged by the variants).

        To further address this, we also explored the relative synaptic vs. extrasynaptic distribution of the variants by using the established MK-801-protocol (to block all synaptic receptors during spontaneous activity, leaving extrasynaptic receptors unblocked)10,11. In neurons overexpressing the GluN2B-*wt* subunit, we obtained an extrasynaptic fraction of 38%, highly consistent with previous reports12,13. Overexpression of the variants, however, yielded a significantly and higher fraction (~50%) of the remaining current, supposedly suggesting more variant receptors at extrasynaptic loci (__Suppl. 8b, c__). However, due to the experimental settings we have chosen, the results from this experiment represent quite the inverse when involving extreme LoF variants. Firstly, 100 mM NMDA does not saturate variant receptors (whether pure, mixed di- or tri-heteromers, see __Table 1__). Secondly, normal neurotransmission does not open synaptic receptors containing mutant GluN2B-subunits, attested by the complete absence of mEPSCs (see __Suppl. 7a, b and __8,9). Thus, during the 10 minutes exposure to MK-801, only (mostly) purely *wt* receptors are blocked by spontaneous synaptic activity, and thus the second bout of 100 mM NMDA solely exposes the remaining *wt*-receptors. An increase in the number reflects more *wt*-receptors at the extrasynapse than the synapse. Thus, the observed increase in the fraction of extrasynaptic receptors in neurons overexpressing the variants, implies that the number of *wt*-receptors is necessarily decreased from the synapse and increases at the extrasynapse. We deem this to ensue due to the incorporation of the variants at the synapse. This increase cannot be explained by an overall increase in membrane expression of *wt*-receptors in neurons overexpressing the variants, as these cells show a strong reduction in Imax  (see __Fig. 6c and Suppl. 7e__). This is now detailed in the text (lines #270-290).
      

      Minor comments:

      5 Looking at the fits in the graph of Figure 2b it appears that the slope on the concentration response curves is less steep for the mixed 2B-diheteromeric NMDA receptors. How much are the Hill coefficients changing and can this be interpreted to provide more mechanistic insight? Wouldn't it make sense to include the Hill coefficients in Table 1?

      We agree with the reviewer’s observation. Actually, the mixed di-heteromers have a similar Hill coefficient (nH) as the purely di-heteromeric GluN2Bwt receptors (see Table 1), and these show the typical near nH ~1 (e.g., 14–16). The only diverging groups are the purely di-heteromeric variant-containing channels (G689C/S only containing receptors; nH~2). Although these may suggest positive cooperation between the subunits, we are less inclined to infer insights from the latter owing to the fact that we limited our examination to 10 mM glutamate (we limit exposure of oocytes to 10 mM glutamate due to artifacts arising past this concentration, as discussed in Kellner et al.8: Fig. 2—figure supplement 1). (this description is now mentioned in page lines #149, 318, 319).

      6 The authors illustrate the changes in potency by the shift in the concentration response curves, but is there any change in efficacy? A simple way to illustrate this would be also present a simple graph showing the maximum current amplitudes (i.e. to 10 mM glutamate) for each of the receptor complexes.

      We now provide these data in (Suppl. 2a, b). We would like to note however that the expression pattern of the tailed-receptors (i.e., subunits with carboxy-termini tagged with C1/C2 tails, see Fig. 1a) are less expressive in general when compared with the native subunits (Suppl. 2c). This description is detailed in lines# 162-166.

      #7 The authors characterize the 'apparent' affinity (or potency) of the receptor using concentration-response curves, but numerous points in the manuscript refer to changes in affinity. None of the experiments shown directly measure affinity (which would require ligand-binding assays) and so the use of the word affinity is inaccurate/misleading. I suggest the authors replace the instances of the word 'affinity' with 'potency'.

      We apologize for the confusion surrounding our use of the term affinity. In fact, we do initially define this term in introduction (page #4): “apparent glutamate affinity (EC50)” to differentiate from affinity (KD). Regardless, and to avoid confusion, we replaced all terms, as suggested by reviewer to potency.

      #8 In the third line of the abstract, the authors wrote, 'for which there are no treatments' in relation to GRINopathies. My understanding is that there are symptomatic treatments but that there are no disease-modifying treatments.

      Indeed, all current treatments are supportive, rather than provide a bona fide cure or disease-modifying. These are now better defined in the abstract.

      #9 The authors have interchangeably used the terms NMDAR or GluNRs throughout the manuscript. I suggest sticking to one of these terms. I would suggest NMDARs since this is less likely to be misread as a a specific NMDA receptor subunit.

      Agreed and corrected throughout manuscript.

      #10 Typos: 1) Results paragraph 2 sentence one: 'We thereby produced GluN2B-wt, GluN2B-G689C and GluN2B-G689S subunits tagged with C1 or C2, co-expressed these along with the GluN1a-wt subunits in...') Results paragraph 2: '...but these were mainly noticeable when oocytes are were exposed to high (saturating) glutamate concentrations...'
3) Last sentence in the second to last paragraph of the results section entitled 'Mixed di-and tri-heteromeric channels...': 'This , PS may serve to rescue...'
4) Last sentence in last paragraph of the results section entitled 'Mixed di-and tri-heteromeric channels...': 'Despite the latter, we found no evidence for any direct effect of three different physiologically relevant concentrations of the drug on di- or tri-heteromeric receptors'

      All typos corrected.

      #11 Figures 1e, 2b, 3b: it would be helpful to add a legend to the graph so that the curves can be interpreted without having to read through the figure legend.

      Corrected.

      #12 The bar graphs in Figure 6 show individual data points but those in figures 4b and 5b don't. Can the authors please add the data points to these graph.

      Individual data points have been added.

      #13 It would be helpful to reviewers that future manuscripts by the authors include page numbers and line numbers.

      Included.

      **Referees cross-commenting**

      #14 Reviewers 2 and 3 highlight an important issue concerning figure 6 and the extent to which the overexpressed variants subunits can compete and assemble with endogenous NMDA receptors (unlike the system where the surface expression of specific receptor complexes is controlled). Indeed in the recent paper by the same authors, the two variants differed in their surface expression (in HEK cells), with G689C expressing particularly poorly. With reference to the second minor comment of Reviewer 1, the maximum current amplitudes would of course need to be normalized to cell surface expression of the receptor to gain any insight into efficacy.

      We provide maximal current amplitudes (Imax) as a proxy for expression level as typically done (e.g.,8,17). These are now shown in Suppl. 2a, b (and see our response to comment #6, above). We would like to emphasize that we find it challenging to gain insights about efficacy of the variants in neuronal synapses, as we purposefully express non-C1/C2 tagged subunits in neurons (as we covet assembly of the variants with endogenous subunits). Moreover, the C1/C2-tagged subunits (whether wt or variants) are less expressive compared to their non-tagged NMDAR-counterparts. For instance, tagged GluN2B-wt subunits express at ~50% compared to non-tagged GluN2B wt subunits (Suppl. 2c). Thus, we find that efficacy of the C1/C2 tagged-subunits is less relatable to the non-tagged subunits (which are used in neurons and likely more relevant to the disease).

      Despite the latter, we deem that we have specifically addressed this issue by measuring miniature EPSCs (mEPSCs) (see our reply to comment #4, Suppl. 7a, b). Briefly, even though the non-tagged G689C expresses at ~40% compared to other subunits (in oocytes and mammalian cells8), in neurons it engenders a robust (and highly significant) negative effect over synaptic currents (mEPSCs), as strong as the G689S-variant which expresses much more robustly (non-tagged G689S expresses to same extent as wt subunits). This demonstrates that the reduced efficacy the tagged subunits is less relatable to the non-tagged subunits and, importantly, it does not hinder the variants’ ability to incorporate within the synapse and affect function (i.e., exert a dominant negative effect). Here, we extend these observations towards the major postnatal channel subtype, namely tri-heteromers (2A/2B*), and therefore demonstrate that the robust dominant negative effect of G689C and G689S variants is likely due to their ability to incorporate within the predominant receptor subtype at the synapse (Suppl. 8).

      Reviewer #1 (Significance (Required)):

      This study emphasizes the complex pattern of effects that variants can have on glutamate receptor function and pharmacology, especially considering the context of receptor subunit composition. The authors have followed up their previous findings on the same mutants (Kellner et al, 2021, Elife), but used a trafficking control system here to characterize properties of mutated receptor complexes that are most likely to exist in neurons. The authors show that the defective currents mediated by NMDA receptors containing a loss-of-function GluN2B variant can be enhanced by neurosteroids (and in the case of GluN1/2B receptors, polyamines also). Development and approval of neurosteroids for the clinic would be required for the findings to translate to a therapy for patients. Readers should also be aware that neurosteroids act on other receptors too (e.g. GABA receptors), which could complicate the outcome. The expertise of the reviewer is in glutamate receptors and synaptic transmission.

      We agree with the reviewer’s comment pertaining to challenges in translating PS to the clinic. Indeed, we explicitly mentioned its inhibitory effect on GABAA receptors (see line #366-367 and reference 18), as well as note its potential negative effect over GluN2C/D-containing receptors (line #365 and reference 19). We further describe alternative neurosteroids and means to bypass the limitations of PS, for instance by use of 24(S)-hydroxycholesterol6,18 or synthetic analogues (SGE-201, SGE-301)6. Lastly, we also propose a novel therapeutic approach, for which we did not find any mentions in the literature with regard to GRINopathies, consisting of the use of the FDA-approved Efavirenz (anti-retroviral compound20) to promote activity of cytochrome P450 46A1 (CYP46A1) to increase amounts of 24-S in the brain (discussion, lines #370-383).


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      #1 The objective of this paper is to assess whether a single mutated subunit of GRIN can affect the function of various forms of NMDA receptors. In particular, this study investigates the functional consequences of a GRIN variant when assembled within tri-heteromers, containing 2 GluN1, 1 GluN2A and 1 GluN2B subunits, the major postnatal receptor type. For this purpose, the authors artificially forced the subunits to associate in predefined complexes, using chimeras of GRIN subunits fused to GABAb receptor retention control sites at the endoplasmic reticulum. This trick allows to control the stoichiometry of the channels at the membrane and thus to focus on the function of a single type of NMDA receptor.The take home message of the paper is that a single GluN2B‐variant, whether assembled with a GluN2B‐wt subunit to form mixed di‐heteromer or with a GluN2A‐wt‐subunit (tri‐heteromer), strongly impairs the receptor functioning, as reported by a decrease of the apparent glutamate affinity of the receptor.

      Altogether, this is a straightforward study of great interest for the GRIN community.

      We greatly appreciate the reviewer’s comment about the relevance of our work towards the GRIN-community.

      2 However, the way the background and purpose of the study (title and abstract) are presented is a bit confusing for non-specialists and could be easily improved. Technical information, which is crucial to validate the conclusions drawn from data analysis, should be added to the article. Some additional experiments are suggested to consolidate the work. Finally, additional discussion points are strongly encouraged.

      We apologize for not making the paper more accessible to a broader readership. We did so for the sake of brevity. Nevertheless, we have re-written major parts of the manuscript to address this issue and retitled the report: “Rescuing Tri-Heteromeric NMDA Receptor Function: The potential of Pregnenolone-Sulfate in Loss-of-Function GRIN2B Mutations”.

      Specific comments

      Abstract / Title:

      3 This work shows that a single GRIN variant impairs the function of various forms of NMDA receptors. Several sentences in the title and the abstract are confusing for a non-specialized audience. "Two extreme Loss‐of‐Function GRIN2B‐mutations are detrimental to triheteromeric NMDAR‐function, but rescued by pregnanolone‐sulfate." "Here, we have systematically examined how two de novo GRIN2B variants (G689C and G689S) affect the function of di‐ and tri‐heteromers." The number of variants tested is not of capital importance in the title, especially because one could believe that both are tested at the same time; similarly, when variants are named in the abstract, the fact that only 1 variant is studied at a time should be clarified (G689C OR G689S). Indeed, the problem is obvious to those familiar with GRIN disorders, but if this paper is to be published in a journal reaching a large audience rather than a specialized audience, the title of the paper should be modified.

      As noted in our reply to comment #1 of this reviewer, we apologize for not making the paper more accessible and have therefore changed the title and re-written major parts of the manuscript to address this issue. We would like to note that we appreciate the reviewer’s comment and intent to increase the readership of our manuscript.

      #4 "We find that the inclusion of a single GluN2B‐variant within mixed di‐ or tri‐heteromeric channels is sufficient to prompt a strong reduction in the receptors' glutamate affinity, but these reductions are not as drastic as in purely di‐heterometric receptors containing two copies of the variants. This observation is supported by the ability of a GluN2B‐selective potentiator (spermine) to potentiate mixed diheteromeric channels." Please, clarify the link between the two sentences. How do spermin potentiation of mixed diheteromeric channels supports the observation that the inclusion of a single GluN2B‐variant has less effect than the inclusion of two variants?

      Our intention was to highlight that mixed di-heteromeric channels (2B/2B*) are less “damaged” (this is the link) than purely di-heteromeric channels (2B*/2B*).Explicitly mixed di-heteromers show less reduction in glutamate potency AND are also spermine-responsive, whereas purely mutant di-heteromers (2B*/2B*) show reduced potency, BUT do not respond to spermine at all. We have rephrased the sentences in our current manuscript to be clearer:

      For instance: The positive responses of mixed di-heteromers, compared to the null effect over pure di-heteromers, is likely the result of the restored pH-sensitivity of mixed di-heteromers (Suppl. 3). This was surprising as the minimal and essential rules of engagement for potentiation by spermine are not well established, particularly in the case of tri-heteromers21,22 (see discussion, lines #341-353).

      Methods

      #5 All this study is based on the use of a unique ER‐retention technique to limit expression of a desired receptor‐population at the membrane of cells. According to the ER system retention of GABAb receptor, used in this study, while C1/C1-fused subunits are retained in the ER, C2/C2 reach the cell surface and the association of C1/C2 in the ER enables cell-surface targeting of the heterodimer. However, GB2 does not contain any retention signal and can reach the cell surface in the absence of GB1, as a functionally inactive homo-dimer (doi: 10.1042/BJ20041435). If there is an experimental trick that prevents the addressing of C2/C2 to the cell membrane, it should be specified and explained. This is critically important for understanding which receptor populations the data are derived from: receptors containing C1/C2 fused subunits only as stated by the authors, or C1/C2 and C2/C2 fused subunits?

      We base our experiments on two seminal reports—23,24—that have developed this unique method (which we also refer to in the text, lines# 112-116). Briefly, the method employs the binding motifs of GABAB1 (GB1) and GB2 subunits and ER-retention motifs (these are now better detailed in Methods section, line # 448). Previous reports explicitly demonstrate that C1/C1- OR C2/C2-containing receptors do not reach the plasma (or very minimally) and we have reproduced these data with our variants (C1/C1: Suppl. 1a-d; C2/C2: Fig. 1a-c).

      Figures #6 NMDA-receptor current amplitude should be normalized by the membrane expression of the receptors. A preliminary experiment should measure the effective cell surface expression of each of the subunits in the different transfection conditions.

      To address the effective cell surface expression, we employed Imax as a proxy for functional expression (e.g.,8,17). These are now shown in Suppl. 2a, b (and see our response to reviewer 1, comments #6 and 14). Expectedly, we find significantly reduced efficacy by the varinats compared to wt-receptors, and the purely mutant di-heteromeric receptors exhibit the weakest efficacy. We have also addressed this issue by measuring miniature EPSCs (mEPSCs) (see our reply to reviewer 1, comment #4,). We find the variants to abolish mEPSCNMDA frequency (Suppl. 7a, b). This shows that the variants’ reduced efficacy translates to elimination of synaptic activity (dominant negative effect) (also seen in Suppl. 8).

      #7 Fig.1a

      The scheme should include C2-C2 complexes and mention whether these complexes are expressed at the cell surface (see previous and following comments).

      As noted in our reply to comment #5 of this reviewer (above), C2/C2-containing receptors do not reach the plasma membrane (Fig. 1a-c). To avoid confusion, we have now added this scheme to the cartoon presented in Fig. 1a and have provided a more detailed description of the method and clones produced in the Methods section (line # 448).

      #8 Fig.1b and c

      Current from cells transfected with GluN2B‐wt‐C1 and GluN2B‐wt‐C2 should be compared to current expressed in cells expressing untagged receptor subunits: GluN2B‐wt Current from cells transfected with GluN2B‐wt‐C1 alone should be shown as well (although expected to be retained in the ER) (as performed for GluN2A‐wt‐C1 GluN2B‐wt‐C1 in suppl Fig. 1a)

      Current comparisons of oocytes expressing tagged GluN2B‐wt‐C1 and GluN2B‐wt‐C2 and non-tagged GluN2B‐wt are now demonstrated in Suppl. 2c. The results indicate that the “tags” (C1 and C2) affect the expression of the subunits. We have also added a sample trace of current from a cell expressing the GluN2B‐wt‐C1 alone (Fig. 1b).

      9 How could you explain the null current from cells transfected with GluN2B‐wt‐C2 alone (Fig.1b middle, and 1c)? since GB2 does not contain any retention signal and can reach the cell surface in the absence of GB1, GluN2B‐wt‐C2 is supposed to reach the cell surface. This is a very important point to clarify (I am probably missing a technical detail) because if the sub-unit tagged with C2 does reach the cell surface, then all the results and conclusions drawn from the C1-C2 conditions are wrong and could be attributed to a mix of complexes containing either C1-C2 or C2-C2.

      We now realize that the reviewer was missing a crucial technical detail, namely how the clones are designed. Briefly, all clones have ER retention motifs and cannot reach plasma membrane unless they necessarily assemble as C1/C223,24. Also, please see our replies to comments #5, 7 to this reviewer (and Methods section, line # 448).

      My following comments are based on the assumption that only receptors containing C1-C2 tagged subunits reach the membrane (as assumed by the authors and suggested in Figure 1b middle), but explanations should absolutely be provided to convince the reader. Fig. 4a and 5a (see our above replies to comments #5, 7 and 9; and references 23,24).

      #10 Please, keep the current scale constant between all current illustrations within the same figure (4a and 5a). Indeed, not only the Spermin- or SP- induced potentiation is an important data (which is presently quantified on the histograms of fig. 4b and 5b) but also knowing whether the amount of current recorded in cells expressing one mutant subunit in presence of SP (for example GluN2A‐wt‐C1 GluN2B‐G689S‐C2) is comparable to the current recorded in wt receptor-expressing cells (GluN2A‐wt‐C1 GluN2B‐wt‐C2) in absence of SP would be an excellent added value for the paper. A special figure could quantify this rescue effect of SP, measuring and comparing the mean currents recorded in these conditions (one current illustration is not sufficient given variations between similar samples). By the way 5mM glutamate might be an excessive concentration. At 1mM, the expected synaptic concentration of glutamate following action potential, according to figures 3 and Suppl1 the response of the mutated receptor is much lower than that of the WT which is already almost maximal. In these conditions, SP-induced potentiation by a factor of two of GluN2A‐wt‐C1 GluN2B‐G689S‐C2 current could be equivalent to control currents recorded in GluN2A‐wt‐C1 GluN2B‐wt‐C2 cells.

      We have rescaled all current amplitudes in Figs. 4 and 5 to be identical in size for easier comparison.

      We have added all current amplitudes to try to examine the rescue effect of the two drugs in cell overexpressing a specific channel subtype, as requested (Suppl. 4). We find that; indeed, the potentiated currents of the mutant receptors reach (or even surpass) the basal Imax (i.e., current before potentiation) of the non-mutated receptors (Suppl. 4, dashed statistics bar).

      In neurons, we address this in two ways. First, we show that the total NMDA-current is reduced by expression of the variants, and this current is “normalized” by PS (Fig. 6a-c). Similar reductions in Imax (by the variants) are shown in Suppl. 7e (to provide more examples). Secondly, neurotransmission (i.e., 1 mM glutamate25,26) is not sufficient for activating mutant receptors, certainly not pre-di-heteromers (see Table1, EC50 and Suppl. 7a, b- no mEPSCs)27–29. Therefore, 5 mM was required. Together, these strongly suggest that PS may normalize the currents of different receptors that respond to PS (under physiological settings and not 1- or 5mM NMDA). As suggested by the reviewer, there are many subtypes, and some may be activated by ambient glutamate (as suggested by application of PS onto neurons without opening the receptors by NMDA; see Suppl. 7c, d).

      #11 Fig. 6

      Figure 6 is not convincing because cultured hippocampal neurons do express endogenous NMDA receptors. To what extent the recording currents are affected by endogenous, non-mutated GluN2B subunits? Western Blots showing an extinction of endogenous subunits expression when transfected tagged subunits are competitively expressed would be required.

      We have previously shown that the two variants incorporate very efficiently within synapses, causing a very robust elimination of synaptic currents (by measuring miniature NMDA-dependent EPSCs; minis) [see Fig. 8 in Kellner et al. eLIFE, 202127, and see review by Sabo et al.9 ). Change in minis’ frequency can be interpreted as either a presynaptic change or a change in synapse number, however we observed that AMPAR-mEPSC frequency was unaffected by these variants. These imply that synapse number and probability of release are unchanged by the variants. As the experiments are performed in wild-type neurons, (which express wild-type GluN2A and -2B), the dramatic effects we observed on minis suggests a dominant-negative effect of these disease-associated GluN2B variants. These are consistent with our observations that mutant subunits can co-assemble with wild-type GluN2B and/or GluN2A subunits. We have now reproduced this experiment (in fact, we employ this strategy prior each experiment to ensure expression of the variants) (Suppl. 7a, b). This thereby shows that there are no available wt-receptors at the synapse.

      As there are various pools of NMDARs at synaptic and extrasynaptic sites, we did not think that a western blot would sufficiently differentiate between the latter, and thereby would not provide insight about extinction of wt-receptors (which could be simply pushed to other sites compared to synapse). Moreover, the intracellular pool of receptors is much larger than the amount of NMDARs that can be detected at the membrane (e.g., 30,31), and therefore electrophysiology seemed to be a better means to monitor membrane receptors only:

      Thus, to examine the distribution of the variants between synaptic- and extrasynaptic loci, we employed a standard procedure consisting of the use of the activity-dependent blocker MK-801 (Methods). Briefly, neurons were persistently bathed in TTX during which they were probed for Imax using 100 mM NMDA (to refrain from activating other GluRs), followed by application of MK-801 for 10 minutes to exclusively blocks synaptic receptors (that open following action-potential independent miniature neurotransmission). This thereby spares all extrasynaptic receptors from being blocked by MK-801, which are subsequently revealed by a second application of 100 mM NMDA (Suppl. 8a, inset)12. In neurons overexpressing the GluN2B-wt subunit, we obtained an extrasynaptic fraction of 38%, highly consistent with previous reports12,13. Overexpression of the variants, however, yielded a significantly and higher fraction (~50%) of remaining current (Suppl. 8b, c), but instead of reflecting a larger pool of extrasynaptic receptors, the experiment represents quite the inverse when involving LoF variants. Firstly, 100 mM NMDA does not saturate variant receptors (whether pure, mixed di- or tri-heteromers, see Table 1). Secondly, normal neurotransmission does not open synaptic receptors containing mutant GluN2B-subunits, attested by the complete absence of mEPSCs (see Suppl. 7). Thus, during the 10 minutes exposure to MK-801, only wt receptors are blocked by spontaneous synaptic activity, and thus the second bout of 100 mM NMDA solely exposes the remaining wt-receptors at the extrasynapse. Thus, the observed increase in the fraction of extrasynaptic receptors, in neurons overexpressing the variants, implies that the number of wt-receptors is necessarily decreased from the synapse and increases at the extrasynapse, most likely due to the incorporation of the variants at the synapse. This increase cannot be explained by an overall increase in membrane expression of wt-receptors in neurons overexpressing the variants, as these cells show, yet again, a strong reduction in Imax as seen above (see Fig. 6c and Suppl. 7e) (lLines #270-291). These thereby suggest that purely wt-receptors are not necessarily eliminated from the membrane (extinct), rather pushed outside of the synapse.

      12 Fig.6b “PE-S” on the graph should be replaced by “PS”

      Typo corrected.

      Discussion #13 The authors are surprised by the fact (Fig.2) that 1 variant reduces the apparent glutamate affinity of the receptor, but not as much as 2 variants, despite the fact that "NMDARs opening requires all four subunits to be liganded (i.e., occupied by a ligand) which implies that the least affine subunit should have dominated the final affinity of the receptor". I agree that the difference is noticeable, however the glutamate affinity for receptors containing 1 variant is much closer to that of receptors containing 2 variants than that of wild-type receptors. Hence, the results obtained do not seem so surprising and could result, as rightly explained by the authors from a possible cooperativity between the subunits.

      We agree with the reviewer that glutamate potency of receptors containing 1 variant subunit is much closer to that of receptors containing 2 variant subunits. However, we maintain our surprise because we expected it to equal (not just close) to the potency of the least affine subunit (the limiting factor). This is based on the notion that all four subunits need to be liganded for channel opening4,32–34. We gently raise the possibility of potential cooperativity (Table 1, see Hill-coefficient and 33,35,36), as well as mention that this may also stem from the variants’ lower proton sensitivity (Suppl. 3), which has also been shown to promote motions of the ATD (amino terminal domain) and increase open probability (positive cooperativity)36. Nevertheless, we are very careful with interpreting the Hill coefficient , as we limited exposure of oocytes to 10 mM glutamate due to artifacts arising past this concentration (see Kellner et al.8: Fig. 2—figure supplement 1). This description is now mentioned in page lines #149, 318, 319. Thus, even the slightest underestimation of the maximal reposnse would surely affect the slope.

      #14 On the other hand, the data in Figure 6 are much more difficult to interpret and reconcile with the nature of the expressed receptor subunits (which this time is not controlled) nor their association within the same receptor. However, this aspect, which is essential to the understanding of the consequences of 1 variant on neuronal signalling, is not discussed: Whatever the stoichiometry of the complexes in the heterozygous disease, the mutated and wild type GluN2B subunits coexist in the same cell: Either within the same di-heteromeric complexes GluN2B-wt + GluN2B-mutant, or in separate complexes but nevertheless expressed in the same cell, in di heteromeric (GluN2B-wt + GluN2B-wt and GluN2B-mutant + GluN2B-mutant); or tri-heteromeric (GluN2A-wt + GluN2B-wt and GluN2A-wt + GluN2B-mutant) complexes. Assuming that half of the complexes remain wild-type, e.g. (GluN2A-wt + GluN2B-wt and GluN2A-wt + GluN2B-mutant) we would expect (Fig. 6) a small decrease in NMDA current (carried only by the half that expresses the mutated subunit, and whose function is not zero but only decreased by about 20% in response to 5 mM Glutamate, Fig. 3b). The same reasoning applies to the di-heteromeric conditions (GluN2B-wt + GluN2B-wt; GluN2B-mutant + GluN2B-mutant), here again the decrease observed Fig. 6b is difficult to reconcile with the responses measured Fig. 2b.

      In other words, how to explain a 50% decrease of the currents, instead of the 10% expected by the previous reasoning. In this experiment we do not know which subunits are expressed, their proportions, nor how they are associated in functional complexes, which makes the interpretation of the data impossible. The only explanation, far-fetched, for 50 % decrease would be that the complexes were to contain all (or the vast majority) 1 wild-type subunits associated with 1 variant, then a homogeneous 50% reduction in current could be expected. But this extreme condition could only be possible in the case of di-heteromers, which is unlikely the case in Fig.6 as GluN2A currents are measured in presence of Ifenprodil. To conclude

      1) the comparison of the currents in transfected and non-transfected neurons does not make sense in figure 6b which is not convincing because we do not know the nature of the currents actually measured. A comparison in controlled condition would make more sense (as I suggested in the criticism of figures 4, 5).

      2) The reality of the combinations of expression and association between subunits within different complexes expressed in the same cell must be considered and taken into account in the interpretation of the data. Undoubtedly, the means of restoring the NMDA current will be different depending on the presence of mutated subunits in all functional channels or not.

      Indeed, neurons express a variety of different combinations of channel stoichiometry, including following transfection with the variants. We do find find that the effect on whole-cell current is indeed ~50% (Fig. 6b, c), thereby safe to assume that 50% remain “wt”, but we do not know how they distribute between synaptic and extrasynaptic loci. Our results however argue against 50% remaining receptors at the synapse. First, mEPSCNMDA disappear (Suppl. 7a, b and see reply to comment #11 of this reviewer), but wt-receptors are still at the membrane, and they seem to be moving out of synapse (Suppl. 8). Thus, we can only state with higher certainty that the variant subunits are very efficient in incorporating within mixed or pure receptors, especially at the synapse.

        We also consider that the reduction in the whole-cell current observed in __Fig. 6b, c__ is not due to the remaining 50% GluN2B-*wt*-containing receptors, rather likely due to other variants, notably GluN2A, which are more prominent at postnatal stages37, such as in our case. In support, we see a large remaining current after saturating ifenprodil application (__Suppl. 7 e, f__)38. Thus, the variants incorporate within all 2A/2B membrane receptors, at the synapse and outside it (i.e., extrasynaptic) (see __Suppl. 8, c__).
      

      **Referees cross-commenting**

      The referees' comments are highly relevant. In particular, referee 3's comment 1 seems very interesting because it may help to better understand the discrepancy in the results observed in neurons, i.e. a 50% decrease in the currents induced by the expression of the mutant and wild type subunits in the same cells, whereas theoretically one would expect a 10% decrease of this current (cf. referee 2's 2nd comment in the discussion section). This comment 1 of referee 3 indeed stresses the fact that the control (non-transfected neurons) to which the heterozygous condition is compared is not the correct control, which should rather be neurons transfected with wild type receptor subunits. More generally, this comment underlines the importance of monitoring the effective membrane expression of the different subunits in each of the experimental conditions in order to be able to compare conditions and draw conclusions.

      We initially did not perform this control as the literature paints a clear picture whereby expression of the GluN2B-subunit (without adding excess of the GluN1 subunit) does not instigate a robust increase in surface expression of NMDARs (and thus current remains the ~same) 4,39–43, and see our reply to comment #14 (above), and reviewer 3 comment 1 (below). Nevertheless, we have now performed this test by overexpressing GluN2B-wt. In support of previous reports, we do not find any statistical difference in current size between non-transfected neurons and neurons solely overexpressing the GluN2B-wt subunit (Fig. 6a, b). Furthermore, application of PS onto naïve or GluN2Bwt expressing neurons yields identical currents (Imax) and potentiation (Fig. 6c, d). These argue that we did not obtain “overexpression”.

      We suggest that the 50% reduction in current size between neurons expressing the mutant and wt expressing neurons stems from the integration of mutant subunits and their dominant negative effect. Evidence for this incorporation is provided by the very strong reduction in synaptic currents (suppl 7a, b), and the supposedly higher abundance of wt-containing receptors in extrasynaptic regions (see reviewer 1 comment 4 and suppl 8). This is

      Reviewer #2 (Significance required):

      The novelty of the study, is to evaluate the consequences of a single mutated subunit within NMDA receptors affected by GRIN variant, to mimic the heterozygous condition of GRIN encephalopathies, this is of potential value for the field and the interest could also be extended to other genetic diseases (at least the experimental way to study the functioning of only one desired stoichiometric configuration). The strength of this paper is precisely to isolate technically and to study the functioning of a desired stoichiometric configuration only. The main limitation of the paper is the interpretation that the authors make of their data in a physio-pathological context. This work could be of interest for general audience, providing the title and summary are slightly modified. My area of expertise could not be closer to the topic of the article: Glutamate receptors; GRIN; molecular tinkering, cell culture, electrophysiology, receptor stoichiometry...

      We thank the reviewer for noting the value in our work and its potential contribution and interest to the field and other diseases. Per reviewer’s suggestion, we have modified the title and text to suit a larger audience.

      Reviewer #3 (Evidence, reproducibility and clarity (Required):

      This paper is a follow up of an earlier paper published by the group (Kellner et al., eLife 2021), which aimed at characterizing the functional properties of two de novo GluN2B mutations in patients suffering from severe pediatric diseases, GluN2B-G689C and -G689S. NMDA receptors (NMDARs) are tetramers composed of two GluN1 and two GluN2 subunits. A single receptor can incorporate either two identical GluN2 subunits (di-heteromers) or two different GluN2 subunits (tri-heteromers), leading to a large diversity of NMDAR subtypes. The main NMDAR subtypes in the adult forebrain are GluN1/GluN2A and GluN1/GluN2B di-heteromers, as well as GluN1/GluN2A/GluN2B tri-heteromers. While the exact proportions of these three subtypes are still contentious, there are evidence that in the adult N1/2A/2B tri-heteromers form the major population of synaptic NMDARs in the adult forebrain. In addition, patients bearing pathogenic mutations are often heterozygous for the mutation, giving rise to mixed NMDARs incorporating one mutated and one intact GluN2 subunit. In their previous paper, Kellner et al. had shown that purely di-heteromeric GluN1/GluN2B-G689C and -G689S mutants display a drastic (> 1,000-fold) decrease of glutamate sensitivity and a decrease of surface expression. In the current paper, the authors characterize the effects of the -G689C and -G689S mutations on N1/2A/2B tri-heteromeric receptors, as well as on mixed di-heteromeric GluN1/GluN2B receptors containing one copy of the wild-type GluN2B subunit and one copy of the mutated GluN2B subunit. They show that one copy of the mutant subunit, either within mixed diheteromeric or tri-heteromeric receptors, is sufficient to decrease drastically glutamate sensitivity, although the shift in glutamate EC50 is not as strong as in pure di-heteromeric receptors (≈ 500-fold). They furthermore explore strategies to counteract the hypofunction induced by these mutations by testing the effect of positive allosteric modulators (PAMs). They show that spermine, a GluN2B-specific PAM, can potentiate the activity of mixed diheteromeric N1/2B but not N1/2A/2B tri-heteromers. However pregnenolone sulfate (a 2A/2B-specific PAM) can potentiate both the activity of mixed diheteromeric and tri-heteromeric NMDAR populations, either in oocytes or cultured neurons.I have very few major comments to make. The experiments are straightforward and the adequate controls have been made. Here are my two only major comments:

      We thank the reviewer for the very detailed overview of our work and for appreciating our controls and methods.

      #1 About the experiment on cultured neurons. The authors compare the currents of cultured neurons transfected with GluN2B-G689C and -G689S to non transfected neurons. The adequate control is rather neurons transfected with the wild-type GluN2B subunit to even out any phenomenon linked to transfection of the neuron. Given the overexpression that can occur after transfection, the effect of the mutations on the size of NMDAR currents might be even stronger than what the authors show. However in that case PS might not completely rescue mutant NMDAR currents to wild-type levels.

      We initially did not perform this control as the literature paints a clear picture whereby expression of the GluN2B-subunit (without adding excess of the GluN1 subunit) does not instigate a robust increase in surface expression of NMDARs (and thus current remains the ~same) 4,39–43, and see our reply to comment #14 (above), and reviewer 3 comment 1 (below). Nevertheless, we have now performed this test by overexpressing GluN2B-wt. In support of previous reports, we do not find any statistical difference in current size between non-transfected neurons and neurons solely overexpressing the GluN2B-wt subunit (Fig. 6a, b). Furthermore, application of PS onto naïve or GluN2Bwt expressing neurons yields identical currents (Imax) and potentiation (Fig. 6c, d). These argue that we did not obtain “overexpression”. Thus, our results and interpretations hold true, and are therefore not underestimation of the effects of PS in neurons.

      2 How come high concentrations of glutamate (>100µM) produce additional current on wt GluN1/GluN2B (with retention signals) compared to 100 µM glutamate, which is supposed to be saturating? It does not seem to stem from an osmotic effect since 10 mM glutamate does not produce any current on uninjected oocytes. Knowing that this "artefactual" effect might also occur in the mutant receptors, how do you take this effect into account when calculating the glutamate EC50s of the mutants? Given the drastic shift in EC50 produced by the mutant, taking into account this artefact is not going to change the conclusion, but the actual EC50s will be affected.

      GluN1/GluN2B-wt receptors (with or without retention signals) are indeed saturated at 100 mM glutamate. However, excessively large concentrations of glutamate (>100 mM) may yield artefacts even in non-injected oocytes (in 10 mM, this occurs in ~20% of the cells, see Kellner et al 20218—Fig. 2 and Suppl. 1c, d) as well as in GluN2B-wt injected oocytes (supplementary Table 1 in 44). This is not due to osmolarity, as rightly mentioned by the reviewer (and below), rather possibly by endogenous glutamate receptors and transporters that do not readily contribute to current amplitude (these are extremely small currents), but can cause deterioration of the cell (and enhance ‘leak’) when activated for prolonged times by very large concentrations (e.g.,45). In fact, we explicitly report these to highlight potential artefacts, as these are often overlooked in the field. Regardless, most reports do no go past 100 mM glutamate, not even when describing GRIN2 mutations since most mutations do not cause such drastic shifts in potency as we observed (to the best of our knowledge only one report describes such an extreme LoF mutation for a GluN2A variant46). Of note, these effects are not seen when glycine is applied at high concentrations (supporting lack of effect by osmolarity)47. Thus, we refrained from testing concentrations past 10 mM, aware that it may yield a slight underestimation of glutamate potency (and perhaps the reason for the larger Hill coefficient, nH; see our reply to reviewer #1, comment #5). Importantly, despite the potential underestimation of the EC50, it does not change our conclusions as all groups are measured side-by-side (thus, the same underestimation equally applies to all other groups as well). We now mention this more in detail in the methods under the section – “Two Electrode Voltage Clamp recordings in Xenopus Laevis oocytes”.

      Minor comments:

      3 In the first paragraph of the "Results" section, when describing the design of the constructs used to force a heteromeric stoichiometry in recombinant systems, the authors do as if they had designed the constructs themselves "Briefly, we tagged...are retained in the ER (Fig. 1a)". Please rewrite this paragraph to show that you used constructs that had been previously designed by another group (Hansen et al., 2014).

      We apologize. We did not mean to express that we have developed the method and indeed refer readers to the seminal works of those who did (Stroebel et al., 2014 and Hansen et al. 2014, lines #109-116). We did not go into details for the sake of brevity. We have rewritten this part to give proper acknowledgement to the method’s developers (also see Methods, line# 448).

      4 I do not see any evidence of "positive cooperativity" between subunits in ref. 32. Ref. 32, to the best of my knowledge, states that in N1/2A/2B tri-heteromers, the 2A subunit sets the biophysical properties of the tri-heteromer. But there is no account of mixed di-heteromers. In addition, the cooperativity between the glutamate and glycine binding sites is negative.

      The reviewer is correct, and we apologize for the mis-citation. Indeed, the cooperativity between glutamate- and glycine-binding is typically reported as negative48,49, and our intention was to highlight the strong cooperativity (whether positive or negative) observed between NMDAR-subunits and meant to cite the works of: 33,35,50 (lines . We have now rephrased the sentence: The divergence from this scenario suggests that the slight amelioration in potency could stem from positive cooperativity between the subunits50 (but see Hill coefficients in Table 1). Indeed, mixed receptors show restored proton sensitivity (Suppl. 3), which has been suggested to be coupled to other receptor features, notably increase in open probability.

      5 Interpretation of spermine action within the Results section: it is striking indeed to observe that the mutations in the context of a mixed di-heteromer still allow spermine potentiation, while they abolish this potentiation in pure di-heteromers. As rightly said in the discussion, the regain of spermine potentiation in the mixed compared to the pure diheteromers is likely due to a more favorable transduction of spermine signaling to the pore, likely via a higher pH sensitivity of mixed di-heteromers compared to di-heteromers. I would thus avoid the terms of "one single intact interface" for the mixed di-heteromer, since both spermine binding sites are likely intact in this NMDAR configuration. How is pH sensitivity affected in the mixed di-heteromers?

      We have performed a detailed pH dose-responses for the various channel types (Suppl 3). We find that GluN2B mixed di-heteromers exhibit similar IC­50 as pure GluN2B-wt di-heteromers, thus explaining their ability to undergo potentiation by spermine via alleviation of proton inhibition. We therefore further suggest that mixed di-heteromers’ have higher pH-sensitivity compared to pure mutant di-heteromers and this mat also contributes to their higher spermine sensitivity. Lastly, we observed that all GluN2A-wt-containing tri-heteromeric receptors were non-responsive to spermine (Fig. 4a). In fact, under our experimental conditions tri-heteromers underwent slight inhibition by spermine, regardless the identity of the GluN2B subunit (whether wt or variant) (Fig. 4b). Thus, as the tri-heteromers used here exhibit identical pH-sensitivity as 2B-di-heteromers, the only diverging aspect is the missing interface between the GluN1a and GluN2B subunits, demonstrating that potentiation by spermine requires at least one GluN2B-subunit with an intact proton sensitivity, and mandates two intact interfaces between GluN1-wt and GluN2B-wt subunits (Table 1)21.

      6 In the methods section, the oocyte recording solution (likely Ringer and not Barth) does not contain any potassium. This is probably a typo. Could you correct the composition of your Ringer?

      Corrected. We record NMDARs currents by use of a Barth solution containing (in mM): 100 NaCl, 0.3 BaCl2, 5 HEPES, pH 7.3 (adjusted with KOH, at ~2.5 mM) (as in 4,51).

      7 There are several typos, especially in the Discussion.

      We have corrected the typos throughout the publication.

      **Referees cross-commenting**

      I overall agree with the comments of reviewers 1 and 2. In particular, I agree that it is pointless to compare the absolute currents of non transfected neurons vs mutant-transfected neurons without an idea of receptor cell-surface expression.

      We have performed this experiment (Fig. 6) and please see our reply to this reviewer’s comment #1.

      I would like however to give some precisions about some comments of Reviewer 2. About the ER retention technique to express tri-heteromers: I didn't know that the C2 signal could be addressed to the membrane on its own. The lack of leak current stemming from C1-C1 or C2-C2 combinations has been demonstrated in the paper establishing the technique (Hansen et al, 2014), as well as in another paper that developed an analog technique based on GABAB retention signals (Stroebel et al., J Neurosci 2014). So it is fair to consider that the authors were not surprised by the lack of current when co-expressing two GluN2B subunits carrying the C2 signal.

      We thank you for this addition and support for our observations.

      About the comparison about absolute currents wt vs mutants, +/- spermine (Fig. 4a and 5a). I agree with reviewer 2 that being able to compare absolute currents of wt without spermine to mutant + spermine would be very interesting to see if spermine can actually rescue mutant hypofunction. However, to the defense of the authors, comparing absolute current values of recordings from Xenopus oocytes is meaningless. Indeed the variability of currents for the same construct and same day of experiment is too high (there can be up to a ten-fold difference between the lowest and the highest current of oocytes expressing the same construct the same experimental day). A way to investigate this aspect would be to estimate the open probability of the different constructs with or without spermine via the inhibition kinetics of an open channel blocker (e.g. MK801) and measure surface expression by Western blot, but I am not sure these experiments are worth it for the spermine experiment.

      We agree with this reviewer about current size. It is quite variable among cells and would therefore introduce an additional variable and variability: the expression of these modified (C1/C2-tagged) subunits is dually affected by the mutation itself (Kellner et al. 2021) and by the introduction of the tagging (which really hampers there trafficking to membrane, Suppl. 2c); with unknown contribution of each variable. We thereby do not think these provide an added value to our conclusions, yet to grant reviewers’ no 2 request we have added __Suppl. 4 __which shows the rescue effect of the different drugs.

      Reviewer #3 (Significance (Required)):

      This paper is not of high significance since most of the characterization of the 2B-G689C and -G689S de novo mutants found in patients has already been published (Kellner et al., eLife 2021). However, this paper is worth publishing since it brings new data on the effect of the mutations on tri-heteromeric and mixed di-heteromeric NMDAR populations, which are likely the most abundant NMDAR populations in the patient's brain at adult stage. Tri-heteromeric and mixed NMDAR populations have often been overlooked when studying pathogenic NMDAR mutations due to the difficulty to express them specifically in recombinant systems. This paper (in addition to other papers in the field, see for instance Elmasri et al., Brain Sci. 2022; Li et al., Hum. Mutat. 2019) shows that the effect of the mutations on the receptor biophysical and pharmacological properties (but also on trafficking) differ whether the receptor contains one or two copies of the mutant subunit. This paper is of interest to scientists interested in NMDA receptor structure-function and pharmacology, as well as clinicians interested in GRINopathies (pathologies linked to NMDAR mutations).

      I, the reviewer, am an expert in NMDAR structure-function and pharmacology. I believe I have sufficient expertise to evaluate the entirety of the paper.

      We thank the reviewer for appreciating and acknowledging the merits of our work for publication.

      References:

      1. Berlin, S. et al. Gαi and Gβγ Jointly Regulate the Conformations of a Gβγ Effector, the Neuronal G Protein-activated K+ Channel (GIRK). J. Biol. Chem. 285, 6179–6185 (2010).
      2. Kahanovitch, U., Berlin, S. & Dascal, N. Collision coupling in the GABAB receptor–G protein–GIRK signaling cascade. FEBS Lett. 591, 2816–2825 (2017).
      3. Berlin, S. et al. A Collision Coupling Model Governs the Activation of Neuronal GIRK1/2 Channels by Muscarinic-2 Receptors. Front. Pharmacol. 11, (2020).
      4. Berlin, S. et al. A family of photoswitchable NMDA receptors. eLife 5, e12040 (2016).
      5. Reyes-Guzman, E. A., Vega-Castro, N., Reyes-Montaño, E. A. & Recio-Pinto, E. Antagonistic action on NMDA/GluN2B mediated currents of two peptides that were conantokin-G structure-based designed. BMC Neurosci. 18, 44 (2017).
      6. Paul, S. M. et al. The Major Brain Cholesterol Metabolite 24(S)-Hydroxycholesterol Is a Potent Allosteric Modulator of N-Methyl-D-Aspartate Receptors. J. Neurosci. 33, 17290–17300 (2013).
      7. Yakovlev, A. V., Kurmasheva, E. D., Ishchenko, Y., Giniatullin, R. & Sitdikova, G. F. Age-Dependent, Subunit Specific Action of Hydrogen Sulfide on GluN1/2A and GluN1/2B NMDA Receptors. Front. Cell. Neurosci. 11, 375 (2017).
      8. Kellner, S. et al. Two de novo GluN2B mutations affect multiple NMDAR-functions and instigate severe pediatric encephalopathy. eLife 10, e67555 (2021).
      9. Sabo, S. L., Lahr, J. M., Offer, M., Weekes, A. L. & Sceniak, M. P. GRIN2B-related neurodevelopmental disorder: current understanding of pathophysiological mechanisms. Front. Synaptic Neurosci. 14, (2023).
      10. Martel, M.-A. et al. The Subtype of GluN2 C-terminal Domain Determines the Response to Excitotoxic Insults. Neuron 74, 543–556 (2012).
      11. Papouin, T. et al. Synaptic and Extrasynaptic NMDA Receptors Are Gated by Different Endogenous Coagonists. Cell 150, 633–646 (2012).
      12. Harris, A. Z. & Pettit, D. L. Extrasynaptic and synaptic NMDA receptors form stable and uniform pools in rat hippocampal slices. J. Physiol. 584, 509–519 (2007).
      13. Moldavski, A., Behr, J., Bading, H. & Bengtson, C. P. A novel method using ambient glutamate for the electrophysiological quantification of extrasynaptic NMDA receptor function in acute brain slices. J. Physiol. 598, 633–650 (2020).
      14. Curras, M. C. & Dingledine, R. Selectivity of amino acid transmitters acting at N-methyl-D-aspartate and amino-3-hydroxy-5-methyl-4-isoxazolepropionate receptors. Mol. Pharmacol. 41, 520–526 (1992).
      15. Laube, B., Hirai, H., Sturgess, M., Betz, H. & Kuhse, J. Molecular Determinants of Agonist Discrimination by NMDA Receptor Subunits: Analysis of the Glutamate Binding Site on the NR2B Subunit. Neuron 18, 493–503 (1997).
      16. Esmenjaud, J. et al. An inter‐dimer allosteric switch controls NMDA receptor activity. EMBO J. 38, (2019).
      17. Liu, S. et al. A Rare Variant Identified Within the GluN2B C-Terminus in a Patient with Autism Affects NMDA Receptor Surface Expression and Spine Density. J. Neurosci. 37, 4093–4102 (2017).
      18. Geoffroy, C., Paoletti, P. & Mony, L. Positive allosteric modulation of NMDA receptors: mechanisms, physiological impact and therapeutic potential. J. Physiol. 600, 233–259 (2022).
      19. Malayev, A., Gibbs, T. T. & Farb, D. H. Inhibition of the NMDA response by pregnenolone sulphate reveals subtype selective modulation of NMDA receptors by sulphated steroids. Br. J. Pharmacol. 135, 901–909 (2002).
      20. Petrov, A. M. et al. CYP46A1 Activation by Efavirenz Leads to Behavioral Improvement without Significant Changes in Amyloid Plaque Load in the Brain of 5XFAD Mice. Neurotherapeutics 16, 710–724 (2019).
      21. Mony, L., Zhu, S., Carvalho, S. & Paoletti, P. Molecular basis of positive allosteric modulation of GluN2B NMDA receptors by polyamines. EMBO J. 30, 3134–3146 (2011).
      22. Stroebel, D., Casado, M. & Paoletti, P. Triheteromeric NMDA receptors: from structure to synaptic physiology. Curr. Opin. Physiol. 2, 1–12 (2018).
      23. Hansen, K. B., Ogden, K. K., Yuan, H. & Traynelis, S. F. Distinct Functional and Pharmacological Properties of Triheteromeric GluN1/GluN2A/GluN2B NMDA Receptors. Neuron 81, 1084–1096 (2014).
      24. Stroebel, D., Carvalho, S., Grand, T., Zhu, S. & Paoletti, P. Controlling NMDA Receptor Subunit Composition Using Ectopic Retention Signals. J. Neurosci. 34, 16630–16636 (2014).
      25. Clements, J. D., Lester, R. A. J., Tong, G., Jahr, C. E. & Westbrook, G. L. The Time Course of Glutamate in the Synaptic Cleft. Science 258, 1498–1501 (1992).
      26. Budisantoso, T. et al. Evaluation of glutamate concentration transient in the synaptic cleft of the rat calyx of Held: Glutamate concentration in synapse. J. Physiol. 591, 219–239 (2013).
      27. Kellner, S. et al. Two de novo GluN2B mutations affect multiple NMDAR-functions and instigate severe pediatric encephalopathy. eLife 10, e67555 (2021).
      28. McAllister, A. K. & Stevens, C. F. Nonsaturation of AMPA and NMDA receptors at hippocampal synapses. Proc. Natl. Acad. Sci. 97, 6173–6178 (2000).
      29. Ishikawa, T., Sahara, Y. & Takahashi, T. A Single Packet of Transmitter Does Not Saturate Postsynaptic Glutamate Receptors. Neuron 34, 613–621 (2002).
      30. Washbourne, P., Liu, X.-B., Jones, E. G. & McAllister, A. K. Cycling of NMDA Receptors during Trafficking in Neurons before Synapse Formation. J. Neurosci. 24, 8253–8264 (2004).
      31. Yan, Y.-G. et al. Clustering of surface NMDA receptors is mainly mediated by the C-terminus of GluN2A in cultured rat hippocampal neurons. Neurosci. Bull. 30, 655–666 (2014).
      32. Kussius, C. L. & Popescu, G. K. Kinetic basis of partial agonism at NMDA receptors. Nat. Neurosci. 12, 1114–1120 (2009).
      33. Sun, W., Hansen, K. B. & Jahr, C. E. Allosteric interactions between NMDA receptor subunits shape the developmental shift in channel properties. Neuron 94, 58-64.e3 (2017).
      34. Benveniste, M. & Mayer, M. L. Kinetic analysis of antagonist action at N-methyl-D-aspartic acid receptors. Two binding sites each for glutamate and glycine. Biophys. J. 59, 560–573 (1991).
      35. Lü, W., Du, J., Goehring, A. & Gouaux, E. Cryo-EM structures of the triheteromeric NMDA receptor and its allosteric modulation. Science 355, eaal3729 (2017).
      36. Vyklicky, V., Stanley, C., Habrian, C. & Isacoff, E. Y. Conformational rearrangement of the NMDA receptor amino-terminal domain during activation and allosteric modulation. Nat. Commun. 12, 2694 (2021).
      37. Stroebel, D., Casado, M. & Paoletti, P. Triheteromeric NMDA receptors: from structure to synaptic physiology. Curr. Opin. Physiol. 2, 1–12 (2018).
      38. Borza, I. & Domany, G. NR2B Selective NMDA Antagonists: The Evolution of the Ifenprodil-Type Pharmacophore. Curr. Top. Med. Chem. 6, 687–695 (2006).
      39. Tang, Y. P. et al. Genetic enhancement of learning and memory in mice. Nature 401, 63–69 (1999).
      40. Gonda, S. et al. GluN2B but Not GluN2A for Basal Dendritic Growth of Cortical Pyramidal Neurons. Front. Neuroanat. 14, (2020).
      41. Sceniak, M. P. et al. A GluN2B mutation identified in Autism prevents NMDA receptor trafficking and interferes with dendrite growth. J. Cell Sci. jcs.232892 (2019) doi:10.1242/jcs.232892.
      42. Philpot, B. D. et al. Effect of transgenic overexpression of NR2B on NMDA receptor function and synaptic plasticity in visual cortex. Neuropharmacology 41, 762–770 (2001).
      43. Barria, A. & Malinow, R. Subunit-Specific NMDA Receptor Trafficking to Synapses. Neuron 35, 345–353 (2002).
      44. Platzer, K. et al. GRIN2B encephalopathy: novel findings on phenotype, variant clustering, functional consequences and treatment aspects. J. Med. Genet. 54, 460–470 (2017).
      45. Green, T., Rogers, C. A., Contractor, A. & Heinemann, S. F. NMDA Receptors Formed by NR1 in Xenopus laevis Oocytes Do Not Contain the Endogenous Subunit XenU1. Mol. Pharmacol. 61, 326–333 (2002).
      46. Swanger, S. A. et al. Mechanistic Insight into NMDA Receptor Dysregulation by Rare Variants in the GluN2A and GluN2B Agonist Binding Domains. Am. J. Hum. Genet. 99, 1261–1280 (2016).
      47. Madry, C., Betz, H., Geiger, J. R. P. & Laube, B. Supralinear potentiation of NR1/NR3A excitatory glycine receptors by Zn2+ and NR1 antagonist. Proc. Natl. Acad. Sci. 105, 12563–12568 (2008).
      48. Regalado, M. P., Villarroel, A. & Lerma, J. Intersubunit Cooperativity in the NMDA Receptor. Neuron 32, 1085–1096 (2001).
      49. Durham, R. J. et al. Conformational spread and dynamics in allostery of NMDA receptors. Proc. Natl. Acad. Sci. 117, 3839–3847 (2020).
      50. Vyklicky, V., Stanley, C., Habrian, C. & Isacoff, E. Y. Conformational rearrangement of the NMDA receptor amino-terminal domain during activation and allosteric modulation. Nat. Commun. 12, 2694 (2021).
      51. Kellner, S. et al. Two de novo GluN2B mutations affect multiple NMDAR-functions and instigate severe pediatric encephalopathy. eLife 10, e67555 (2021).
    1. Now, when data is transformed into evidence, when we isolate or distill the features of a data set, or when we generate a visualization or present the results of a statistical procedure, we are not presenting the artifact. These are abstractions. The data itself has an artifactual quality to it. What one researcher considers noise, or something to be discounted in a dataset, may provide essential evidence for another.

      When it comes to data analysis, I usually think of data as a source of information rather than it being a research object by itself. The term “raw data” has been used in all my classes, starting from accounting and finishing with introduction to digital culture and information. Yes, we’ve talked about biases that come up in different data sets, but usually this conversation is related to so-called “post-production” of data – either us, students, using it, or someone else and we reverse engineered where it came from. So, reading about an approach to data, even ‘raw’ data, as a constructed artifact is very refreshing. It’s extremely important to look at how the raw data was collected and what was left out by collectors initially to have a full image of what’s going on.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Major:

      - The statement (line 149'Together, our data suggest that systemic ecdysone levels are unlikely to be involved in modulating tumour-induced muscle detachment or to mediate the role of fatbody Insulin signalling in regulating muscle detachment.') is derived from an experiment with sterol free diet (in which 20HE is genetically addressed) and a pleiotropic experiment (PG>RasG12V). In neither paper nor the current manuscript, 20HE levels have been directly addressed.

      Therefore, this statement needs further experimental support and discussion. Ecdysone is a critical hormone during development and especially growth-related effects central to this study. The authors should consider doing pharmacology or augment their claims here with genetic manipulation experiments of 20HE related genes in larvae (Leopold, Rewitz, Rideout, Drummond-Barbosa, Schuldiner labs) and adult animals using genetics, pharmacology or direct assessment of 20HE levels (RIPA, Edgar and Reiff labs).

      The main point we were trying to convey is that we do not think global ecdysone levels plays a role in modulating fatbody insulin or tgfb signalling, which in turn affects muscle detachment. We are not claiming that edysone levels is not changing in control vs. tumour bearing animals. In fact, we predict that 20HE levels will be different in tumour bearing vs. control animals (as tumour bearing animals undergo developmental delay), but this is not the main point of our conclusions. We believe that our conclusions are supported by the experiment demonstrating global ecdysone alterations (via feeding sterol-free food) did not affect how fatbody Akt activation altered tgfb signalling and enhanced muscle integrity (Figure S1). Therefore, we don’t think measuring 20HE helps to support our conclusions. Pharmacological inhibition via feeding ecdysone inhibitors effectively demonstrate a similar point to feeding sterol-free food which we have already performed. We are happy to try direct manipulation of 20HE related genes (eip75B-RNAi) in the fatbody to see if this affects muscle detachment or pAkt and pMad levels in tumour bearing animals.

      - In Fig.7 the authors used a sog-LacZ stock to show transcriptional activation in fatbody cells. This stock is based on P-element insertion in the according regulatory regions and supposed to express lacZ with an nls. I can clearly see lacZ in nuclei in Fig. 7H, whereas this is very hard to see in nuclei in Fig7i in the tumour model. In addition, lacZ is known for its high stability and not the best option. As this finding is vital for central claims of this study, it should be complemented by either qPCR for sog on fat body cells or using another readout by converting one of the two Mimic lines (BL42189/44958) into GFP sensors for sog.

      We will add a counterstain to these images. We will also perform qPCR in the fatbody of control and cachectic animals to assess whether Sog transcription is altered. We agree converting one of the Mimic lines to a GFP sensor would be a good option, but this experiment would require getting new fly lines into Australia, which takes at least 2 months because of quarantine laws. We don’t believe this experiment would change the general conclusions of the paper, therefore would prefer not to do this experiment.

      - I have similar problems with Fig.7B-F, as phosphorylated Mad should be translocated to the nucleus. In 7F the authors measure pMad over Dapi, which is the right way but it is hard to see pMad in the nucleaus apart from Fig7B, wheras in D and E, where the authors measure higher levels, I cannot identify clear pMad in nuclei. These images either need to show the Dapi channel or more representative images should be chosen like in Fig.4 with arrows pointing to measured nuclei. Fig.7C something went wrong with the compression of this image.

      We will show more representative examples and fix Fig 7C.

      - The proper function of RNAi stocks targeting genes like sog, mad, etc. is vital for this study as these lines are used throughout the study. Functional evidence of specific knockdown efficiency should be provided or references given in which these stocks were shown to provide functional knockdown on transcript or protein level.

      We agree with the reviewer that this is an important point. We will demonstrate the knockdown of sog and mad (and other RNAis) used in the study by either referring to published data or demonstrate knockdown ourselves.

      - Fig.S7 discusses appearance of gbb/Bmp7 and Sog/CHRD in human patients. The analysis the authors performed shows a correlation between both factors, but is hampered by the fact that datasets for peripheral tissues of cachexia patients are unavailable. The authors may consider sorting these after tumor entities in which cachexia occurs frequently vs. low occurrence and then check for both genes.

      We will try this analysis.

      Fig.5 M-P pMAd is not indicated in the Panels only the legend.

      We will fix this error.

      - Please follow FlyBase nomenclature, e.g. dlg1 for discs large 1 and unify in the whole manuscript and figure for all genes.

      We will fix this error.

      - For endogenous fusion proteins like Viking-GFP (e.g. vkg::GFP) choose a format to clearly decipher them from transcriptional readout stocks like sog-lacZ.

      We will fix this error.

      - The quantifications in most figures are quite small with tiny lettering and XY axis are difficult to read in letter/A4 size.

      We will enlarge font size.

      Minor:

      1. Adjust in-figure caption alignments

      2. Line 104: add comma RasV12, dlgRNAi

      3. Line 114: replace little  not significant (n.s.)

      4. Line 334: 'sogRNAi overexpression' to my knowledge, RNAi are expressed, not overexpressed.

      5. Line 454: italicize r4>

      6. Fig S4E: remove frame

      7. Figures 6: It would be better to number and explain the pathway presented in the figure in text and fig legend.

      8. Just a personal preference. Lettering of images in images is commonly done horizontally, here it appears like a mix between vertical and horizontal.

      We will fix these minor errors.

      Reviewer #2

      Major comment

      Their genetic experiments clearly showed that the reduction of insulin signaling activity in the fatbody induces upregulation of TGF-β signaling and Collagen accumulation. Then, how does TGF-β signaling induce Collagen accumulation?

      From the experiments we have carried out, we do not have insights into how TGF-B signalling induce Collagen accumulation.

      They showed that Rab10 knockdown and SPARC overexpression reduced the accumulation of fatbody ECM. Are Rab10 and SPARC expression regulated by TGF-β signaling?

      We can address this point by assessing if Rab10 and SPARC expression is altered in cachectic fatbody.

      Minor comments

      Line 90: "Disc Large (Dlg) RNAi in the eye" must be "Discs Large (Dlg1) RNAi in the eye imaginal discs".

      we will fix this error.

      Figures 1D and 1L are from the same image. Also, Figures 1C and 1M are from the same image. Are both of them necessary to be shown in the different panels?

      The duplication of 1C and 1M, was an error, we thank the reviewer for picking this up. We will fix this error. We will use different images for 1D and 1L.

      Why are the staining patterns of anti-pAkt shown in Figures 1L and 1U so different? pAkt is not detected in the nuclei in Fig. 1L but its nuclear signal is clear in Fig. 1U.

      We will show more representative images of these staining.

      Figure 1: Images of counter staining for nuclei like DAPI should be also included for all these fatbody images.

      We will show counter staining for DAPI.

      Line 101: "Tumour specific ImpL2 inhibition was sufficient to reduce fatbody pAkt levels." Is this correct? ImpL2 inhibition in tumors should elevate the pAKT level in fatbody.

      This was a mistake, we will fix this error.

      Figure S1~S4: These figures and their legends do not correspond to each other. We thank the reviewer in picking up this error, there was an error in inserting the images into the text. S2 and S3 were swapped.

      We will fix this error.

      Line 189: The pAkt level in the muscle of tumour-bearing animals should be examined to confirm the activity of the insulin signaling is downregulated.

      We will include this data.

      Line 189: If the authors conclude that muscle insulin signaling predominantly regulates translation and atrophy, OPP assay for the muscle cells should be examined in the same experimental settings.

      We will carry out OPP assay upon Akt overexpression in the muscle.

      Line 247: The expression level of Rab10 and SPARC should be examined in the fatbody of tumour-bearing animals to see whether Rab10 is upregulated and SPARC is downregulated.

      Line 247: If Rab10 upregulation and SPARC downregulation are the causes of the accumulation of ECM proteins in the fatbody of tumour-bearing animals, how the overexpressed Collagen proteins can be secreted from the fatbody cells?

      We are not sure, but the overexpression of Collagen proteins is at an extremely high level, therefore, it is possible that some of it can be processed and secreted despite Rab10 upregulation and SPARC downregulation. We have carried out an experiment to overexpress Collagen proteins in the muscle, in this case, this manipulation did not rescue. This indicates that processing of Collagen in the fatbody is important, however, we do not know how the processing is regulated.

      Line 347: Sog is a secreted BMP antagonist. Thus, it can be expected that the Sog overexpression downregulates TGF-β signaling in fatbody and muscle tissues. If the rescued phenotypes with Sog overexpression can be explained by this logic, pMad level should be examined in these experiments.

      We have shown this data in Figure R-T. We will refer back to this data in Line 347.

      Reviewer #3

      Major comments:

      - Are the key conclusions convincing?

      Most of the conclusions are convincing. It is not clear however whether the ECM accumulation in the fat body of tumor animals is fibrotic and whether it is extracellular or in the cell cortex.

      - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      -The authors state in line 71 'This deposition of disorganized ECM leads to fibrotic ECM

      accumulation.' The authors haven't really provided evidence for the ECM being fibrotic. The authors could either rephrase this or provide additional experimental evidence of fibrosis in the fat body.

      We will tone down the claim that the ECM accumulation is fibrotic.

      - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      -The authors state in line 147" Finally, in tumor-bearing animals fed a sterol-free diet, that underwent a prolonged 3rd instar stage due to reduced ecdysone levels (Parkin and Burnet, 1986), we activated insulin signalling in the fatbody via Akt overexpression (QRasV12, scribRNAi). We found that this manipulation caused a significant decrease in pMad levels in the fatbody and a rescue of muscle detachment (Figure S1 D-I), similar to animals fed a standard diet (Figure 1 O-Q, Figure 2 F-H)." Since it's not already known what the extent of muscle integrity defect there is in tumors with additional sterol free diet, it would be important to show a non-tumor control for comparison in FigS1F. This would also then make it clear to what extent the defect is rescued by Akt overexpression.

      We will include a non-tumour control for Fig S1F.

      -The authors state in line 158 'Upon the knockdown of Impl2, we found that tumor gbb was not significantly altered (Figure S3A).' Even though this shows an indication that Gbb levels are not reduced, the n number is too low to state that it is non-significant. The authors should increase the n number here.

      N=3 is generally enough to see a difference, we will include data done in parallel which shows Impl2 RNAi is sufficient to induce a reduction in Impl2 RNA levels. This will demonstrate that n=3 is sufficient to demonstrate a reduction in transcript levels if there is a reduction.

      -The authors state in line 171 'Conversely, knockdown of gbb alone or knockdown of gbb together with ImpL2 significantly rescued the Nidogen overaccumulation defects observed at the plasma membrane of fatbody from tumor-bearing animals, while ImpL2RNAi alone did not (Figure S2 Q-U).' This is a somewhat misleading representation, since again no non-tumor control was used, so the extent of the rescue by gbb knowdown is not obvious. In FigS2P Nidogen levels in the tumor seem ~100% higher than in control. But in FigS2U, in which no control was included, the tumor+gbb knowdown seems ~ 20% lower than tumor. So it is probably a more moderate rescue, but that's only possible to assess by including a non-tumor control in FigS2U. Also the images in FigS2Q-T don't seem representative since they appear to show a much bigger difference in fluorescence intensity than ~20%. Please show more representative images.

      We will include a non-tumour control for S2Q-T and show more representative pictures.

      -The authors state in line 174 'Finally, co-knockdown of gbb and ImpL2 in the tumor significantly rescued the reduction in OPP and Nidogen levels observed in the muscles of tumor-bearing animals (Figure S3 B-I).'

      Again, the single knockdowns and the non-tumor control are not shown in FigS3E and I and should be included for comparison and to see the contribution of each knockdown and to be able to judge the extent of the rescue.

      We will include the single knockdowns and a wildtype control

      -Regarding Fig3O: Is there a significant tumor muscle attachment defect here? In this graph the tumor only looks about 10% lower than the WT (rather than 40% in Fig2E). The other issue is the extremely low n number for WT. I would recommend increasing the n number for WT here and to indicate in the graph whether the tumor is significantly different to WT (or non-significant, in which case RabRNAi wouldn't actually 'rescue' the defect). In the present form, this graph is not very convincing.

      We will increase the n number for WT for this experiment. The reduction in muscle detachment is 10% rather than 40% here is because this experiment was done at day 6, which we will indicate in the figure legend. The 40% reduction in Fig2E is because these samples were processed at day7. Rab10RNAi experiment was carried out at day 6, because by day7, the Rab10RNAi rescue is so good, most of the tumour bearing animals have pupated, thus the experiment could only be carried out at day6.

      - Regarding Fig3W: A non-tumor control would be important to include to be able to judge the extent of muscle attachment defects and the extent of the rescue for UAS-Sparc. This will allow to assess the severity of muscle integrity defect in this particular experiment (since it appears to vary in different experiments e.g. muscle defect in tumor 40% in Fig2E and ~10% in Fig3O) and to assess the extent of rescue for the various genotypes.

      We will include a non-tumour control for 3W.

      -The authors show an accumulation of ECM in the fat body of tumors. It is not clear, whether this ECM accumulates intracellularly near the cell surface or extracellularly. The authors should assess this, maybe by doing electron microscopy.

      We do not have an EM facility that can accommodate this experiment, thus doing EM is not an option for us. However, we can address whether the accumulation of ECM is intracellular or extracellular by performing an experiment, where we try perform antibody staining against Viking-GFP without permeabilizing the cells. If Viking is detected without permeabilization, it would indicate the accumulations are extracellular. This approach has been previously used to address this question in Zang et al., elife, 2015.

      - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      -These suggested experiments should be quite straightforward since they are mostly just repeating previous experiments with the appropriate controls and n numbers. I would think that they can be done within a few months. The electron microscopy should not take more than a few weeks and not be costly.

      - Are the data and the methods presented in such a way that they can be reproduced?

      -The details on how old animals used in each experiment were, are not easy to find and not written very clearly. They should be included in the each figure legend rather than summarising those details in the methods.

      We will add the number of days in the figure legend.

      -Also, in line 788 in the methods, several stocks are indicated as coming from particular labs (e.g. UAS-FOXO (Kieran Harvey), UAS-GFP (Kieran Harvey), UAS-lacZRNAi (Kieran Harvey), UAS-RasV12 (Helena Richardson), UAS-cg25C;UAS-Vkg (Brian Stramer)).

      However, it is not clear whether these labs actually made these stocks and if so whether it has already been described in their papers how the lines were made. If the lines are unpublished, the detailed information should be given on how the lines were made. Or if the lines are published, the authors should provide the reference.

      We will fix these references.

      - Are the experiments adequately replicated and statistical analysis adequate?

      In general, the n number is rather low in several experiments, especially n of 3 for many controls. And as I mentioned before, rescues of tumor phenotypes are often shown without including a non-tumor control, making it hard to judge the extent of the rescue. Sometimes this information can be found in other figures, but the reader should not have to search for it. And also the severity of the phenotype can vary from experiment to experiment.

      We will include a non-tumour control when appropriate to address this.

      Minor comments:

      - Specific experimental issues that are easily addressable.

      - Are prior studies referenced appropriately?

      Yes, as far as I can tell.

      - Are the text and figures clear and accurate?

      -In the literature, people usually call it 'fat body' rather than 'fatbody'.

      We will fix this error.

      -The authors state in line 265 "Vkg accumulated in the membranes of fatbody where p60 was overexpressed using r4-GAL4 (Figure 5 A-C)."

      This must be a typo. I think it is shown in Fig5E-G. Unless it's labelled wrongly in the figure and B, C and D show p60 rather than TorDN.

      We will fix this error.

      -The authors state in line 188 'This manipulation significantly rescued muscle integrity (Figure S4 A-C) and muscle atrophy (Figure S4 D-F), without affecting muscle ECM levels (Figure S4 G-H).' According to the graph in FigS4H this does actually 'affect muscle ECM levels' significantly, as in that it reduced Nidogen levels further. The authors could rephrase this.

      We will reword this statement.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This important work reports the identification of a list of proteins that may participate in the clearance of paternal mitochondria during fertilization, which is known as essential for normal fertilization and embryonic and fetal development. While the main method used is state of the art and the supporting data are solid, the vigor of the biochemical assays and function validation is inadequate. This work will be of interest to developmental and reproductive biologists working on fertilization. Key revisions (for the authors) include 1) Use a mitochondria-enriched fraction instead of whole sperm for the assays, and add more control samples to monitor what got lost during sperm and oocyte treatments before the coincubation step. 2) Functional validation of the key proteins identified.

      We thank Editors of eLife, as well as Special Issue Guest-Editors and Reviewers for a favorable assessment and helpful recommendations for key revisions. Provisional revisions included in our revised article are detailed below. We agree with Editors’ comment about the use of mitochondrion enriched fractions and additional functional validation of key proteins. In fact, we are developing experimental protocols for oocyte extract coincubation with isolated sperm heads and tails, and eventually with purified mitochondrial sheaths, to separate the ooplasmic sperm nucleus remodeling factors from the mitophagic ones. Such experiments, as well as functional validations using porcine zygotes are contingent upon anticipated post-pandemic rebound in the availability of porcine oocytes, obtained from ovaries harvested on slaughterhouse floors, requiring currently unavailable workforce which has hampered our access to this necessary resource.

      Reviewer #1 (Peer Review):

      Could the authors make clear how much the presented pictures reflect the described localisation? There is no information on the number of spermatozoa and embryos observed nor the fraction of these embryos showing the presented pattern of localisation. This must be included.

      Two hundred spermatozoa were counted per replicate of the cell-free system co-incubation and 20 zygotes per replicate, with 3 replicates of immunolabelling for each phase/picture which were examined to establish the typical localization patterns that were observed. The displayed patterns were observed in 65 to 88% of examined spermatozoa/zygotes; varying dependent on protein, replicate, and phase of immunolabelling. In all cases, the signal displayed is the typical pattern that was displayed in most cells. This information has been added to the Materials and Methods section for clarification.

      It is not clear if the authors also examined the localization of other proteins and obtained a different pattern than anticipated from the proteomic approach or if they only tested these 6 proteins and got a 100% of correlation.

      These are the 6 proteins which were selected based on extensive literature review into known functions of all identified proteins, as well as extensive research into available and reliable antibodies to detect such proteins within our porcine systems. Even so, no particular localization patterns were anticipated; instead, we presented the patterns actually observed and even some patterns which defied our expectations (i.e., the localization of BAG5 in the sperm acrosome).

      The authors use "MS" in the text to indicate "mitochondrial Sheath" and "Mass spectrometry". this is confusing.

      The authors agree and the usage of MS as an acronym for either has been removed entirely to avoid confusion.

      In the introduction the author refers to Ankel-Simons and Cummins, 1996 as a reference for the number of sperm mitochondria in mammalian species, this is incorrect since the quoted paper is about the number of mtDNA molecules and mentioned an earlier publication.

      This has been revised and the appropriate citation has been used.

      Reviewer #2 (Peer Review):

      Major:

      1) It has been proved from the earlier studies from this group that the porcine cell-free system is useful to observe spermatozoa interacting with ooplasmic proteins in a single trial and could recapitulate fertilization sperm mitophagy events that take place in a zygote without affecting later cell-division process. However, the post-fertilization sperm mitophagy process is a complex time-associated event that many processes that occur sequentially and interactively, which means ooplasmic proteins might be involved in this process but may not directly interact with sperm or may associate with sperm-ooplasmic protein complex at different time points. It is certainly a great advance already in knowledge to identify "the candidate players" from the list of 185 proteins; however, with the time-resolution (4 and 24hr) in the current study and without functional validation experiments at this stage, it is still difficult to postulate the importance of these identified proteins. The functional validation experimental designs, in my opinion, is critically important for better interpretation of the data.

      The authors agree with this reviewer’s sentiments and do plan to conduct further functional analysis. This project was able to generate a list of candidate, sperm-mitophagy promoting proteins and we were further able to show that many of these proteins were detectable both via mass spectrometry and via immunocytochemistry in spermatozoa exposed to our cell-free system. Furthermore, similar localization patterns were found in spermatozoa that were detected within newly fertilized zygotes. These results boost our confidence in our cell-free system and show that our list of candidate proteins is truly a useful list for future localization and functional analyses. We are certainly aware that we have not captured every protein that may play a role in post-fertilization sperm mitophagy and that the proteins captured are just candidates until proven otherwise. Likewise, we have almost certainly captured multiple proteins that are currently candidates that will likely not be shown to play a role in postfertilization sperm mitophagy, while it is plausible that at least some of these candidate proteins do play a role in mitophagy and some of them likely participate (perhaps have yet to be described roles) in other fertilization events, in which we would be extremely interested in as well.

      2) As shown in Figure 1, whole sperm was used in the co-incubation and the later MS analysis; thus, proteins identified in the current study might be relevant in fertilization processes other than postfertilization sperm mitophagy, as proteins identified in the current study may be associated with other parts of the sperm (e.g. sticky sperm head, e.g. PSMG2 associated with sperm midpieces, tail at 4hr coincubation, but then only associate with sperm head at 24hr co-incubation) rather than sperm midpiece, despite the fact that authors applied immunohistochemistry to show the localization of this protein, but the evidence is indirect, so how authors functionally differentiate these 6 identified proteins from sperm mitophagy process with other processes and to confirm (or to associate) the relevance of these proteins with sperm mitophagy process?

      The authors agree that the 6 proteins which were further studied by using immunocytochemistry may be playing roles in other processes such as pronuclear formation. We discussed some potential roles including and beyond post-fertilization mitophagy, in the Supplemental Discussion. After reviewer comments, we moved the Supplemental Discussion back in the main Discussion section. Thus, this section now considers additional putative pathways in which the said 6 proteins cold participate, though we concede that thorough functional studies must still be performed.

      3) Class 3 proteins were present in both the gametes or only the primed control spermatozoa, but are decreased in the spermatozoa after co-incubation, which authors interpreted as sperm-borne mitophagy determinants and/or sperm-borne proteolytic substrates of the oocyte autophagic system, this data categorization may need to be revised as sperm-borne proteolytic substrates of the oocyte autophagic system only, not for sperm borne mitophagy determinants. The argument for this disagreement is due to the fact that if the protein is a sperm-borne mitophagy determinant, after coincubation, to execute the mitophagy process, this protein should still be associated with the sperm at least at the early stage (of 4hr) (constant under MS detection when comparing control with 4hr treated) rather than being released from the sperm. Or alternatively, they could result in class 3 proteins (but not all those 6 were in class 3). Nevertheless, if these proteins serve as substrates, they can be used (consumed) and show decreased under MS detection.

      This argument for redefining the Class 3 proteins more accurately is understood and we agree. The definition is revised in the paper.

      4) Of particular interest among the 6 proteins that were further investigated. Unlike other proteins, MVP was highly significant (p<0.001) after 4hr incubation, but the significance became less after 24hr (p=0.19). Interpretation of this dynamic change in the relevance of the mitophagy process would facilitate the readers to understand the relevance and the role of MVP.

      The differences in significance are likely influenced by the abundance of MVP detectable by mass spectrometry. As the time of cell-free system incubation increases, the variability between replicates also seemed to increase, likely due to the sustained proteolytic activity taking place in our system. This work was based on three replicates of mass spectrometry for each time point; additional replicates likely would have reduced the p-value for the 24hr cell-free data set, for MVP and potentially other proteins also. At both time points, MVP was only detectable in spermatozoa after they had been exposed to the cell-free system treatment which is the criteria that truly interested us more than the actual differences in content between the timepoints and is why it was added to our list of candidate proteins.

      5) In figure 3, the association of ooplasmic MVP to sperm midpiece is not convincing enough as sperm midpiece and tail often show some levels of non-specific signals under fluorescent microscopy. And the dynamic association of ooplasmic MVP to sperm midpiece in Fig. 3F-G is difficult to reach a conclusion solely based on data presented in the manuscript. Additional negative control of sperm MVP staining from the primed and treated sperm would be helpful. Additionally, a quantitative comparison (15 vs 25hr) of sperm-associated MVP signals from the fertilized embryo or a stack image from different angles would clarify the doubts raised here.

      For all images and all replicates, serum controls were also generated. These controls were then viewed under fluorescent microscope, and light intensities and exposures thresholds for each fluorescent light channel were set based on the background intensity that came from these nonimmune serum-treated control samples. We set our light intensity/acquisition time below a threshold where the non-specific signal began to appear. All the presented patterns are based on setting this peak intensity threshold and as such the signal we see should be the true signal. Furthermore, 200 spermatozoa were counted per treatment per replicate of the cell-free system co-incubation and 20 zygotes per replicate, with 3 replicates of immunolabelling for each protein and data point, which was used to represent the typical localization patterns that were observed. The displayed patterns were observed between in 65- 88% of examined spermatozoa/zygotes. Invariably, the signal displayed in the manuscript is the typical pattern that was seen in a majority of cells. This information has now been added to the Materials & Methods section for clarification.

      6) Same concerns for the other 5 proteins (PSMG2, PSMA3, FUNDC2, SAMM50, BAG5) as indicated above.

      See response to Question 5.

      7) The patterns of these 6 proteins under the immunofluorescent study are confusing as the pattern varies after co-incubation (treated), and mostly, the signal of these proteins observed from the fertilized embryos is not really associated with sperm midpieces. Therefore, the evidence of these proteins involving in post-fertilization sperm mitophagy is, at this moment, weak based on the data presented. But the relevance of these proteins in events post-fertilization or early embryo development is certainly (evidence did not strong enough to support "sperm mitophagy," in my opinion).

      The authors agree that some of these proteins seem to be playing roles beyond postfertilization sperm mitophagy and that there is a need for true functional studies before the authors can state with certainty that these proteins play a role in any of the discussed fertilization events. We state this in the discussion: “Considering the dynamic proteomic remodeling of both the oocyte and spermatozoa which takes place during early fertilization, these 185 proteins which have been identified likely play roles in processes beyond sperm mitophagy.” It should be noted that the authors went into greater detail about potential alternative protein functions based on the present data and literature review in the Supplemental Discussion. Based on this comment and other reviewer comments we have now included the Supplemental Discussion as part of the main Discussion section, and this will hopefully help clarify some of the authors’ thoughts about the 6 candidate proteins which were further analyzed during this study.

      Minor:

      1) To my understanding, statistical significance (relevance) is normally set at a p-value of either <0.1 or 0.05. The reason for loosening the p-value of 0.2 in the current study needs to be justified as this was not a common statistical criterium, and the interpretation of those candidates from this loosened criterium should also be careful.

      The loosening of statistical relevance in this study to 0.2, only applied to our Class 1 proteins. This is because for a protein to fall into the Class 1 proteins it was a protein that was only present in samples after they were exposed to the cell-free system. In the case of these Class 1 proteins, this happened for all 3 replicates at each stated timepoint. We found this pattern of detection to be important whether the p-value fell under 0.1 or 0.2. As such, we loosened our statistical threshold for our Class 1 proteins. Any proteins added to our candidate list will be subject to further investigation before definitive conclusions can be drawn, and as such we think that capturing more proteins was more important for the goals of this study than limiting the number of proteins captured, especially for those Class 1 proteins. An explanation of this has been added to the Materials & Methods section Mass Spectrometry Data Statistical Analysis.

      2) First cell cleavage of porcine embryo normally occurs within 48hr post-insemination or activation; therefore, the 4 and the 24hr time points used in the current study require justification included in the discussion or methods and material section.

      First cleavage of porcine embryos normally occurs around 24 - 28 hours post-insemination. Thus, for both the cell-free system and the embryo studies we were capturing an advanced 1 cell stage zygote/zygote like system with our 24 hour and 25-hour time points.

      3) In figure 2, colors used in different time points and in two different classes represent (sometimes) different protein categories, would be easier for the readers for quick comparisons if the same color could be used to represent the same protein category throughout the graph. (E.g, proteins for early zygote development are shown in red in "A", but blue in "B")

      This has been corrected and the color scheme for Figure 2 has been revised for easier comparisons.

      Reviewer #3 (Peer Review):

      I am not used to seeing a supplementary discussion in a manuscript. I also believe it should be incorporated into normal discussion.

      The Supplemental Discussion has been incorporated into the main Discussion now.

      It would be very helpful to make an additional figure in which the proposed interactome of identified factors with the sperm mitochondria before and after incubation are drawn schematically and also which factors are not IDed in both cases (when comparing to somatic mito- or autophagy). This eases to get through the discussion and will beautifully summarize and illustrate the importance and progress that the authors have made with this assay.

      We made a diagram that depicts the changes in protein localization patterns overtime within our cell-free system. This diagram has been added to the manuscript as Figure 9.

      Reviewer #1 (Public Review):

      In this manuscript, the authors used an unbiased method to identify proteins from porcine oocyte extracts associated with permeabilised boar spermatozoa in vitro. The identification of the proteins is done by mass spectrometry. A previous publication of this lab validated the cell-free extract purification methods as recapitulating early events after sperm entry in the oocyte. This novel method with mammalian gametes has the advantage that it can be done with many spermatozoa at the time and allows the identification of proteins associated with many permeabilised boar spermatozoa at the time. This allowed the authors to establish a list of proteins either enriched or depleted after incubation with the oocytes extract or even only associated with spermatozoa after incubation for 4h or 24h. The total number of proteins identified in their test is around 2 hundred and with very few present in the sample only when spermatozoa were incubated with the extracts. The list of proteins identified using this approach and these criteria provide a list of proteins likely associated with spermatozoa remnants after their entry and either removed or recruited for the transformation of spermatozoa-derived structures. Using WB and histochemistry labelling of spermatozoa and early embryos using specific antibodies the authors confirmed the association/dissociation of 6 proteins suspected to be involved in autophagy.

      While this unique approach provides a list of potential proteins involved in sperm mitochondria clearance it's (only) a starting point for many future studies and does not provide the demonstration that any of these proteins has indeed a role in the processes leading to sperm mitochondria clearance since the protein identified may also be involved in other processes going-on in the oocyte at this time of early development.

      We thank reviewer 1 for positive comments. We added a sentence in Discussion addressing the obvious shortcoming of present study, as further functional validations of candidate mitophagy factors are planned.

      Concerning the localisation of the 6 proteins further analysed, the authors must add how much the presented picture represents the observed patterns. They must include the details on the fraction of spermatozoa and embryos displaying the presented pattern.

      We now specify that the patterns depicted in manuscript are typical and representative of data from at least three replicates of immunolabeling in spermatozoa and zygotes. For each of these replicates, 200 spermatozoa were examined per replicate of the cell-free system co-incubation or 20 zygotes per replicate. The displayed patterns were observed between 65-88% in examined spermatozoa/zygotes. Invariably, the signal displayed in manuscript is the typical pattern that was seen in a majority of cells. This information has now been added to the Materials & Methods section for clarification.

      Reviewer #2 (Public Review):

      Mitochondria are essential cellular organelles that generate ATPs as the energy source for maintaining regular cellular functions. However, the degradation of sperm-borne mitochondria after fertilization is a conserved event known as mitophagy to ensure the exclusively maternal inheritance of the mitochondrial DNA genome. Defects on post-fertilization sperm mitophagy will lead to fatal consequences in patients. Therefore, understanding the cellular and molecular regulation of the postfertilization sperm mitophagy process is critically important. In this study, Zuidema et. al applied mass spectrometry in conjunction with a porcine cell-free system to identify potential autophagic cofactors involved in post-fertilization sperm mitophagy. They identified a list of 185 proteins that might be candidates for mitophagy determinants (or their co-factors). Despite the fact that 6 (out of 185) proteins were further studied, based on their known functions, using a porcine cell-free system in conjunction with immunocytochemistry and Western blotting, to characterize the localization and modification changes these proteins, no further functional validation experiments were performed. Nevertheless, the data presented in the current study is of great interest and could be important for future studies in this field.

      We thank reviewer 2 for positive comments. As we explain in our response to Editors and Reviewer 1, further validation studies will be resumed once the availability of slaughterhouse ovaries for such studies improves. Examples of such functional validation of pro-mitophagic proteins SQSTM1 and VCP are included in our previous studies (DOI: 10.1073/pnas.1605844113 and DOI: 10.3390/cells10092450) that led to the development of cell-free system reported here, and are cited in present study.

      Reviewer #3 (Public Review):

      In this manuscript, a cytosolic extract of porcine oocytes is prepared. To this end, the authors have aspirated follicles from ovaries obtained from by first maturing oocytes to meiose 2 metaphase stage (one polar body) from the slaughterhouse. Cumulus cells (hyaluronidase treatment) and the zona pellucida (pronase treatment) were removed and the resulting naked mature oocytes (1000 per portion) were extracted in a buffer containing divalent cation chelator, beta-mercaptoethanol, protease inhibitors, and a creatine kinase phosphocreatine cocktail for energy regeneration which was subsequently triple frozen/thawed in liquid nitrogen and crushed by 16 kG centrifugation. The supernatant (1.5 mL) was harvested and 10 microliters of it (used for interaction with 10,000 permeabilized boar sperm per 10 microliter extract (which thus represents the cytosol fraction of 6.67 oocytes). The sperm were in this assay treated with DTT and lysoPC to prime the sperm's mitochondrial sheath. After incubation and washing these preps were used for Western blot (see point 2) for Fluorescence microscopy and for proteomic identification of proteins.

      Points for consideration:

      1) The treatment of sperm cells with DTT and lysoPC will permeabilize sperm cells but will also cause the liberation of soluble proteins as well as proteins that may interact with sperm structures via oxidized cysteine groups (disulfide bridges between proteins that will be reduced by DTT).

      This is certainly a possibility, the lysoPC and DTT permeabilization steps were designed to mimic natural processing (plasma membrane removal and sperm protein disulfide bond reduction), which the spermatozoa would undergo during fertilization. However, we do realize that this is a chemically induced processing and thus is not a perfect recapitulation of fertilization processes. However, in this study and in previous studies with this system, we were able to show alignment between proteomic interactions taking place in the cell-free system and within the zygotes.

      2) Figure 3: Did the authors really make Western blots with the amount of sperm cells and oocyte extracts as the description in the figures is not clear? This point relates to point 1. The proteins should also be detected in the following preparations (1) for the oocyte extract only (done) (2) for unextracted nude oocytes to see what is lost by the extraction procedure in proteins that may be relevant (not done) (3) for the permeabilized (LPC and DTT treated and washed) sperm only (not done) (4) For sperm that were intact (done) (5) After the assay was 10,000 permeabilized sperm and the equivalent of 6.67 oocyte extracts were incubated and were washed 3 times (or higher amounts after this incubation; not done). Note that the amount of sperm from one assay (10,000) likely will give insufficient protein for proper Western blotting and or Coomassie staining. In the materials and methods, I cannot find how after incubation material was subjected to western blotting the permeabilized sperm. I only see how 50 oocyte extracts and 100 million sperm were processed separately for Western blot.

      The authors did make Western blots with the number of spermatozoa and oocytes stated in the materials and methods, a total protein equivalent of 10 to 20 million spermatozoa (equivalent to ~20-40 µg of total protein load) and 100 MII oocytes (equivalent to ~20 µg of total protein load). These numbers have been corrected in the Materials & Methods. Also, we did find in the Materials & Methods section that the Co-Incubation of Permeabilized Mammalian Spermatozoa with Porcine Oocyte Extracts section refers to using cell-free exposed spermatozoa for electrophoresis; however, for none of the presented Western blot work was this true. Rather, all of the presented Western blots as per their descriptions are utilizing ejaculated or capacitated sperm or oocytes. This line has been removed from the Materials & Methods to reduce confusion.

      Regarding preparation (2), we have previously assessed the difference between oocyte extract and intact oocytes in this manner internally and we are certainly losing proteins due to the oocyte extraction process. We make caveats in this vein throughout the article such as: “Furthermore, this cell-free system while useful does not perfectly capture all the events which take place during in vivo fertilization. The cell-free system is intended to mimic early fertilization events but is presumably not the exact same as in vitro fertilization.”

      3) Figures 4, 5, 6, 7, and 8 see point 2. I do miss beyond these conditions also condition 1 despite the fact that the imaged ooplasm does show positive staining.

      For all the presented Western blots, the tissue type is stated in the image description and the protocol which was used to prepare these samples is stated in the Materials & Methods.

      4) These points 1-3 are all required for understanding what is lost in the sperm and oocyte treatments prior to the incubation step as well as the putative origin of proteins that were shown to interact with the mitochondrial sheath of the oocyte extract incubated permeabilized sperm cells after triple washing. Is the origin from sperm only (Figs 5-8) or also from the oocyte? Is the sperm treatment prior to incubation losing factors of interest (denaturation by DTT or dissolving of interacting proteins preincubation Figs 3-8)?

      The authors understand that there are proteins and interactions lost on both sides of the cellfree system equation and we have added a sentence to the Discussion to caveat this limitation in the system.

      5) Mass spectrometry of the permeabilized sperm incubated with oocyte extracts and subsequent washing has been chosen to identify proteins involved in the autophagy (or cofactors thereof). The interaction of a number of such factors with the mitochondrial sheath of sperm has been shown in some cases from sperm and others for an oocyte origin. Therefore, it is surprising that the authors have not sub-fractionated the sperm after this incubation to work with a mitochondrial-enriched subfraction. I am very positive about the porcine cell-free assay approach and the results presented here. However, I feel that the shortcomings of the assay are not well discussed (see points 1-5) and some of these points could easily be experimentally implemented in a revised version of this manuscript while others should at least be discussed.

      We agree that the use of a mitochondrial-enriched subfraction for further analysis would be interesting and useful. We are actively developing experimental protocols for oocyte extract coincubation with isolated sperm heads and tails, and eventually with purified mitochondrial sheaths. However, such experiments are contingent upon our access to porcine oocytes, which has continued to be a struggle since the COVID-19 pandemic compromised our ability to attain oocytes in large, cheap, and reliable quantities. This was a continuous problem with preparing materials for this very paper and has continued to be an issue for our laboratory as well as many others at our university and across the country. We continue to maximize oocytes every time we can get access to them, but the unfortunate reality is that this access has become sparce and unreliable over the past three years.

    1. Costanza-Chock explains that we should be designing algorithms that are just.45 This means shifting from the ahistorical notion of fairness to a model of equity.

      This reminds me of a metaphor my high school used to properly explain the difference between equality and equity. Let's say there's a fence, and on the other side is a baseball game that you and your friend are trying to peek over and watch. You each get a box to stand on, and now you can see over the fence! Your friend, however, is shorter than you and still can't reach. Although you may have the same box to stand on (equality), in order to get the same opportunity to watch the game you have to put effort into making sure that everyone actually receives that truly equal opportunity, e.g. another box for your friend.

      Costanza-Chock's example of college admissions to explain equality vs. equity also make me think about what kinds of digital barriers exist in place to prevent restorative justice. Issues such as accessibility, class, and status keep coming up for me and now I'm wondering: How does class background influence the attempts made by digital humanities scholars who try to perform this restoration?

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Recommendations For The Authors):

      1) The strikingly different conclusion from the previous Bourane study seems to stem from the experimental approaches. Rather than using genetic crosses that target all neurons from the hindbrain and spinal cord that express Npy at any point in development, Boyle et al target their manipulations specifically to the lumbar region of the superficial dorsal horn in adult mice using direct viral injections. Thus, Boyle is almost certainly manipulating much fewer neurons that the original study. How then is their behavioral effects so much greater? At the minimum, the authors need to discuss this discrepancy head on. Better would be a direct molecular/anatomical comparison of the neurons targeted by each approach. This could be done using Nyp-Cre mice crossed to a Rosa-LSL-reporter strain and quantifying the overlap with the same markers used here. Perhaps, the intersectional approach with Lbx1 resulted in labeling of a different population of neurons than the adult AAV injections? Although likely outside the scope, given this work directly questions the main conclusion of the Bourane paper, it will be important to see a replication of the original finding of selectivity to mechanical itch.

      We agree that our approach should be manipulating a smaller population of neurons, and that it is therefore suprising that we see greater behavioural effects. Please see our response to "Weakness 1" of Reviewer 2 for consideration of this point. We have already provided a direct molecular comparison as requested by the reviewer, and this appears in Figure 1 supplement 1. Here we used tissue from NPY::Cre that had been crossed with Ai9 mice (i.e. a Rosa-LSL-reporter) and had received intraspinal injections of AAV.flex.GFP. We then characterised the neurochemistry of tdTomato+ cells that were GFP+ or GFP-negative.

      2) The authors state that, "91.6% ± 0.3% of cells classed as Cre-positive cells were also Npy-positive, and these accounted for 62.1% ± 0.6% of Npy-positive cells" If I am reading this correctly, does that mean that 40% of the Npy+ cells are Cre negative? If so, how is this possible?

      This interpretation is correct. For quantification of RNAscope data we used a cut-off level of 4 transcripts, and cells with fewer than 4 transcripts were classed as negative. It is likely that some of the NPY cells classified as negative for Cre would have had some Cre mRNA (sufficient to cause recombination), but at a level below this threshold. It is also possible that some NPY+ cells would fail to express Cre, since this is a BAC transgenic mouse, rather than a knock-in.

      3) Similarly, the authors state that "great majority of FP-expressing neurons in laminae I-III were immunoreactive (IR) for NPY (78.5% ± 3.6%), and these accounted for 74.6% ± 109 1.9% of the NPY-IR neurons in this area". So does this mean 20% of the recombination is non-specific/in other cell types that could be involved in pain/itch sensation?

      Our finding that 91.6% of cells with Cre mRNA were also positive for Npy mRNA (see above) indicates that Cre expression was largely restricted to NPY cells. The failure to detect NPY peptide in some of these cells probably results from the relatively low level of peptide seen in the cell bodies of peptidergic neurons, which results from the rapid transport of peptides into their axons.

      4) Comparing Fig 3B and Fig4B it seems the control baseline von Frey responses are different. In fact, baseline response in Fig4b is quite like the CNO effect in Fig 3B. Unless I'm misunderstanding something, this seems quite odd?

      We agree that there is a difference between the baseline responses. We are not aware of any particular reason for this, and we think that it reflects a degree of variability that is seen with the von Frey test. Interestingly, the baseline values for the SNI cohort (Fig 4E) lies between the values in Fig 3B and Fig 4B.

      5) In Fig 4E, the behavior of the CNO treated mice is quite variable. Can the authors comment as to how this might be happening? Does the effect correlate with viral transduction?

      We did not see any obvious correlation between the extent of viral transduction and the behaviour of individual mice.

      6) Fig6, the PDyn-Cre experiment, is a bit of a non sequitur?

      Please see our response to "Weakness 2" of Reviewer 2 for consideration of this point.

      7) The conclusion is unusually long. I recommend trimming it to make it more concise.

      We presume that this refers to the Discussion. However, this was ~1550 words, and we do not feel that that is unusually long.

      Reviewer 2 (Public Review):

      Weaknesses

      1) There is inadequate discussion about previous studies of NPY interneurons. Specifically, the authors should address why a more restricted subset of these neurons (this study) have broader effects than seen previously.

      We have expanded the discussion on the discrepancies between our findings and those reported previously. We state at the outset that we are targeting a more restricted population (lines 509-10), and we now go into more detail concerning both similarities and differences between our findings and the reasons that we think may underlie any discrepancies (various changes between lines 522-575).

      2) I cannot see the reason for including results from manipulation of Dyn+ interneurons in this paper. First, the title does not reflect roles of spinal Dyn+ population. In addition, without further experiments characterizing relationships between NPY and Dyn interneurons in modulating itch and/or nociception, Dyn datasets seem to deviate from the main theme.

      We had previously shown that activating Dyn-INs suppressed pruritogen-evoked itch (Huang et al 2018), but it was important to test whether silencing these cells would have the opposite effect. Our finding of overlap in function (i.e. both NPY-INs and Dyn-INs suppress itch, and that both innervate GRPR cells) provides strong evidence against the idea that neurochemically-defined interneuron populations have highly specific functions, and we now state this in the Discussion. The anatomical experiments (which follow on from the functional studies) provide important new information concerning synaptic circuitry of the dorsal horn, by showing that NPY-INs preferentially innervate GRPR cells, and provide around twice as many synapses on these cells, compared to the Dyn-INs. Interestingly, this correlates with the relatively large optogenetically-evoked IPSCs that we saw when NPY-INs were activated, compared to those reported by Liu et al (2019) when galanin-expressing (which largely correspond to Dyn-INs) were activated. By including these findings in the paper, we are able to make comparisons between these two populations.

      3) While the authors provided convincing evidence that GRPR+ neurons serve as a downstream effector of NPY+ neuron evoked itch, the relationship between GRPR and NPY neurons in modulating pain is not examined. Therefore, Fig. 7B is pure speculation and should be removed.

      We feel that our recent findings that GRPR neurons correspond to vertical cells, that they respond to noxious stimuli, and that activating them results in pain-related behaviours, makes it reasonable to speculate that the NPY/GRPR circuit may also be involved in the anti-nociceptive action of NPY cells. The legend for Fig 7B already refers to this as a "potential circuit", and we have toned down the corresponding part of the discussion to say that our findings "raise the possibility" that this is the case (lines 605-7). We feel that this part of the figure is important, as otherwise our summary diagram ignores some of the main findings of the paper, and we hope that this is now acceptable.

      Recommendations For The Authors

      1) Fig. 1G: the "misexpression" of tdTomato neurons was much more prominent in deep dorsal horn laminae but not in the superficial ones. Was this representative? Can the authors perform a laminae specific characterization?

      We did test for this possibility in 2 NPY::Cre;Ai9 mice that had received intraspinal injections of AAV.flex.GFP, and found that there was a modest difference - 62% of tdTomato+ cells in laminae I-II, but only 39% of those in lamina III, were GFP+. This suggests that "misexpression" may have differed slightly between these regions. However, since the difference was quite modest, and we were only able to analyse tissue from two mice in this way, we did not include these findings in the paper.

      2) I have a lot of problems interpreting the c-Fos data in Fig. 2 E and F. For the mCherry- population, how was the quantification performed? From the image, it does not look like 2030% of cells express c-Fos; at a minimum a clear stain of neurons would be needed. Similarly, the identification of NPY cells is not particularly convincing (e.g., middle arrowhead lower 2 panels in C).

      We have provided further details on how the analysis was performed (changes made to lines 1016-29). NeuN staining was used to reveal all neurons, and a modified optical disector method was performed from somatotopically appropriate regions of the dorsal horn. As noted by the Reviewer, NeuN staining was required to allow identification of mCherrynegative cells. However, we have not included the NeuN immunoreactivity in the image, as this would add considerably to the complexity. These images are from single optical sections, and therefore the overall numbers of cells are low (in comparison to what would be seen in a projected image). The intensity of mCherry staining varied between cells. However, for all mCherry-positive cells (including the example referred to by the Reviewer), there was clear staining in the membrane, which could be followed in serial sections.

      3) Please add individual data points for all quantifications.

      These have been added.

      Reviewer 3 Recommendations For The Authors:

      1) It is somewhat surprising that there is no effect on CPP after activating spinal NPY neurons in neuropathic mice, given the almost complete rescue of hypersensitivity to baseline values in the nociceptive tests. Based on the methods, it appears that conditioning was carried out already 5 min after CNO injection. Yet, suppression of c-fos activity in excitatory spinal dh neurons was observed 30min after CNO injection. Also, it is not clear to me when CNO was injected prior to the nociceptive or CQ testing?

      Have the authors considered that conditioning from 5-35 min after CNO injection might be too short after CNO injection to achieve a profound analgetic effect?

      In a previous study (Polgár et al 2023), we had observed the timecourse of CNO-evoked itch and pain behaviours in mice in which GRPR cells expressed hM3Dq. We found that these started within 5 minutes of i.p. CNO injection (e.g. Fig S2 in that paper). In addition, the timecourse of action of gabapentin and CNO (both given i.p.) are likely to be similar, and there was a preference for the chamber paired with gabapentin. We are therefore confident that the conditioning period with CNO was adequate. We now explain this in the Methods section (lines 846-52). The timing of CNO injections for the nociceptive and CQ tests is now described (lines 749-55).

      2) The authors claim that tonic pain was not affected based on the conditioned place preference test. Efficacy in withdrawal response tests and in the CPP differ by more than duration of the stimulus. I'd suggest using more cautious wording here.

      We agree that caution is needed in interpreting the results of the CPP experiments. We have therefore replaced "does" with "may" in the Results section (line 336) and "did" with "may" in the Discussion (line 620).

      3) On page 9 the authors state "...suggesting that they suppress the transmission of pain- and itch-related information in the dorsal horn." However, pain is not affected in the loss of function experiments suggesting some qualitative differences in the role of the NPY neurons in itch and pain. This should also be reflected more clearly in this statement and in the discussion e.g. "suppress itch" and "can suppress pain".

      We accept the point made by the Reviewer. We have slightly altered the wording in lines 249-51 and 610 to reflect this.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the four reviewers for their generally positive feedback on the manuscript. Below, we provide a point-by-point response to each reviewer.

      We are performing new FCS and gradient measurements as suggested by the reviewers. We are confident we can have these completed within three months (accounting for the summer break).


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *This manuscript reports a very thorough and careful study of the mobility of Bicoid in the early embryo, explored with single-point fluorescence correlation spectroscopy. Although previous groups have looked into this question in the past, the work presented here is novel and interesting because of the different Bicoid mutants and constructs the authors have examined, in particular with the goal of understanding the role of the protein DNA-binding homeodomain. The authors convincingly show that there is a significant increase in Bicoid dynamics from the anterior to the posterior region of the embryo, and that the homeodomain plays an important role in regulating the protein's dynamics. Their experiments are very well designed and carefully analyzed. The authors also modelled gradient formation to see whether this change in dynamics might play a role in setting the shape of the gradient. I am not sure I fully agree with their conclusion that it does, as mentioned in my comment below. However, it is an interesting discussion to have, and I think this paper makes a significant advance in our understanding of Bicoid's behavior in the early embryo. *

      We thank the Reviewer for their positive comments and their suggestions for improving the manuscript. We will resolve the concerns raised by the reviewer with clarity in the revision. We will also add additional comment in the Discussion regarding the interpretation of our results.

      *Major comments: *

      • 1) Gradient profile quantification: Some of the conclusions made by the authors rely on the comparison between their model of gradient formation (as captured in the equations in lines 232 and 233) and the Bcd intensity profile measured in the embryos. Since the differences in gradient shape predicted by the different models are very small (see Fig. 3B, which is on a log scale and therefore emphasize small differences, and Fig. 3C), it is very important to understand how reliable the experimental concentration profiles are.*

      This is a fair comment. It is worth noting that the key differences between the 1- and 2-component models are only apparent at large distances (and hence low concentrations) from the source.

      We performed the quantification of the gradients in a manner similar to the Gregor lab, whereby the midsagittal plane is analysed. We used 488nm illumination (rather than 2-photon, as the Gregor lab does) so our measurements are likely noisier. However, we are not investigating the variability in the gradient here, but the mean extent. We currently correct background with a uniform subtraction, but we appreciate that is not the optimal method.

      In the revised manuscript, we will repeat the above experiments using a 2-photon microscope. Further, we will image lines expressing His::mcherry without eGFP under the same imaging conditions to more accurately estimate the background signal. While we expect this to improve the data quality, we do not envisage significant change to the observed profiles based on prior experience.

      At the moment, I do not find the evidence that [Bcd] concentration profile is more consistent with a 2-component diffusion model than a 1-component model very strong. A few comments related to this: * * 1a. Line 249, it is mentioned that: "observations ... incompatible with the SDD model". Which observations exactly are incompatible with the SDD model?

      The key points are in the preceding paragraph. We will improve the model presentation in the Results and also include further contextualisation in the Discussion.

      1b. In Fig. 3D, only the prediction of the 2-component model is shown. What would the simple 1-component diffusion model look like? Is it really incompatible with the data?

      We agree with this comment and will provide the 1-component fit to the gradient profiles. We expect it to fit well for the anterior half of the embryo but fail at larger distances (as has been previously shown).

      Regarding the FCS data, we also show one and two component fits. We will show the alternative fits – a 2 particle fit is clearly an improvement (see also related response to reviewer 2).

      1c. Line 243: "The increased fraction in the fast form ... consistent with experimental observation of Bcd in the most posterior" (Mir et al.)". I am not sure how this is significant, since the simple model also predicts there will be Bcd in the posterior - the only difference is how much is there (as shown in Fig. 3C), and it's a very small difference.

      The absolute differences are not large between the two models, but due to the observed clustering (Mir et al. 2018), even small differences can have very large effects. In the revision we will provide estimates of the actual concentration differences.

      We are performing new experiments with the Fritzsche lab at Oxford to estimate if there is clustering of Bcd. We will also repeat our FCS experiments to validate our key conclusion of AP differences in diffusion of Bcd. These should be completed by the end of the summer.

      1d. Since the difference between models is in the posterior region where Bcd concentration is very low, when comparing the models to the data the question of background subtraction is essential. How was the subtracted background (mentioned line 612) estimated?

      See above response to the first comment.

      1e. Along the same line, were the detectors on the Zeiss LSM analog or photon counting detectors, and how confident can we be that signal is exactly proportional to concentration?

      We used PMTs and did not directly do photon counting. But the intensity is still proportional to the concentration. It is possible to estimate the absolute concentration value, e.g., Zhang et al., 2021 (https://doi.org/10.1016/j.bpj.2021.06.035). However, our main conclusions – especially regarding the spatially varying Bcd dynamics – are not dependent on this.

      1f. Can the gradients created by the two Bcd mutants (FIg. 4B) be quantified as well, and are they any different from the original Bcd gradient?

      We agree this would be useful. We will provide the gradient quantifications of the bcd mutants in the revision.

      1e. What is the pink line in Figure 5C (I am assuming the green one is the same as in Fig. 3D)? It could be better to not use normalization here, or normalize everything respective to the eGFP::Bcd data to make comparison in relative concentrations in the posterior for different constructs more evident (also maybe different colors for the three different data sets would help clarity).

      This is a fair comment, and we will create graphs with new data for better visualisation.

      1f. Discussion, lines 402-403: Does the detailed shape of the Bcd in the posterior region matter at all, since the posterior is not a region where Bicoid is active, as far as we know? Could a varying Bcd dynamics have other consequences that would be more biologically relevant?

      Bcd is now known to act at 70% EL (Singh et al., Cell Reports 2022). So, the gradient is relevant for a large extent of the embryo length, though it is not known if there is any effect in the most posterior region.

      2) Model for gradient formation (lines 231-238): * * 2a. Whether the molecules of Bcd can change from their fast to slow form is never questioned. How do we know (or why might we suspect) they do exchange?

      This is a good point. Within the nucleus, and based on our mutant data, we suspect the fast/slow forms correspond to unbound/bound DNA states.

      In the cytoplasm, the dynamics are less clear. Bcd can bind to cytoskeletal elements (Cai et al., PLoS One 2017) as well as to Caudal mRNA. Therefore, it seems reasonable to have different effective dynamic modes – yet, how such switching occurs remains unclear.

      Ultimately, our model approximates multiple dynamic modes that are integrated to drive Bcd motion. Including switching between states is a reasonable assumption based on what is known about cytoskeletal and protein dynamics, but we do not have a specific mechanism.

      It is challenging to estimate a specific kon / koff rate, as the dynamic changes also depend on the diffusion – which itself is changing. For now, we believe our level of abstraction is appropriate given what is known about the system. It will be very interesting to explore the specific interactions underlying such behaviour in the future, but that is beyond this current manuscript.

      2b. The values used in the model for alpha, beta_0 and rho_0 should be mentioned. Maybe having a table with all the parameters in the method section, or even in the supplementary section, would help. The exact values of alpha and beta matter, because if they are large (fast exchange) a single exponential gradient is to be expected, if they are 0 (no exchange) a double exponential gradient is to be expected, with intermediate behavior in between. Which case are we in here?

      We agree and will add a more complete table in the revision.

      3) Discussion about anomalous diffusion (lines 386-388): The 2-component model used by the authors to interpret their FCS data seems very well justified here (excellent fits with very small residuals). I agree with the authors' conclusion that "the dynamics of Bcd within the nucleus are more complicated than a simple model of bound versus unbound Bcd", but I don't see how that can lead to a diagnostic of anomalous diffusion instead. Maybe it is just a matter of exactly explaining what is meant by anomalous diffusion here (since this term is often used to mean different things). A more likely scenario I think, is that there are more than just two Bcd components in the system.

      This is a good point, and we can’t easily differentiate two/multi- component fits from anomalous diffusion ones. This is a known problem. But we have recently shown in a collaboration with the Laurent Heliot lab (Furlan et al, Biophys J 2019), that anomalous diffusion is a good stable indicator of changes, even if it might not be the right model. We use anomalous diffusion as it stably predicts changes. We do not claim, however, that diffusion is anomalous. We will improve the discussion of these points in the revised manuscript.

      4) Line 440 and after: What is the evidence that the transition between the two forms might vary non-linearly with Bcd concentration? How would that help adapt to different embryo sizes? It would be good to be more explicit here instead of just referring to another paper.

      We will improve this discussion. The central point is that the action of Bicoid is unlikely to simply depend linearly on concentration as in that case the ratio of fast to slow forms would be constant across the embryo. Related to the above comment, it is important to emphasise that we are using a phenomenological model, not one based on a specific mechanism.

      5) Since an important aspect of this work is the study of different Bcd constructs in vivo, it is important that these constructs are very clearly described, so the section on the generation of the fly lines (Methods) should be expanded. In particular: * * 5a. It seems that the eGFP:: NLS control used here was different from that first described in Ref. 64 (and used for FCS experiments in Ref. 30 and 36)? If so, what NLS sequence was used here, and precisely what type of eGFP was used (in particular, was the A206K mutation that prevents dimerization present in the eGFP used)? If it is the same construct as in Ref. 64, it should be mentioned explicitly. * * 5b. Were the mutant N51A and R54A lines gifts as well, or have they been described before? If so, previous publications should be referenced. If not, how the plasmid was introduced in the embryo should be briefly explained.

      We agree and will expand on the fly lines in the revision.

      6) Concentration calibration measurements (Methods Fig. 2, line 568 and on). It is well known that background noise is going to interfere with the measurement of N when the signal becomes equivalent to the background noise (Koppel 197, Phys Rev A 10:1938-1945, and for a recent discussion of this effect for morphogens in fly embryos: Zhang et al., 2021, Biophysical Journal 120,4230-4241). It is almost certain that in the low signal regions of the embryo (e.g. posterior cytoplasm) this is affecting the reported concentration, and should be at least acknowledged.

      We agree with the reviewer. We will provide the SBR. We will also correct the N values based on the method followed in Zhang et al., 2021, Biophysical Journal 120,4230-4241.

      *7) Reference 3 is mis-characterized in two different ways in the manuscript: * * 7a. Line 50: The conclusion in Ref. 3 was not that the gradient was due to a diffusive process, on the contrary Gregor et al. argued that Bcd was too slow to form such a long-range gradient by diffusion. Studies that do present data consistent with a morphogen gradient formation mechanism driven by diffusion are reference 5, reference 30, Zhou et al., Curr. Biol. 2012;22(8):668-75 and Müller et al., Science 336 (2012) 721-724. *

      Gregor et al., do not argue against a diffusion process – indeed, they utilise a SDD model in their paper. However, they do extensively discuss how the predicted dynamics from the SDD model are not compatible with gradient formation as observed after n.c. 13. This problem was resolved to some degree by FCS measurements of Bcd (e.g., Dostatni lab, Development 2011) and the use of a Bcd tandem reporter which showed that production and degradation change during n.c. 14 (Durrieu et al., MSB 2018). We will improve the framing of these results in the revision.

      7b. The diffusion coefficient estimated from FRAP measurements and reported in Ref. 3 (D = 0.4 micron^2/s) is mentioned a couple of times in the manuscript (line 66, line 395, line 411). However, this number is simply incorrect. When fast components (such as the ones clearly detected here by FCS) are present, they diffuse out of the photobleached area during the photobleaching step. If that is not corrected for during the analysis (and it wasn't in Ref. 3), then the recovery time measured is just equal to the photobleaching time, and has nothing to do with either the fast or slow fraction of the studied molecule - it has no other meaning than to give a lower bound on the value of the actual effective diffusion coefficient of the molecule. This effect (called the halo effect) is well known in the FRAP community (see e.g. Weiss 2004, Traffic 5:662-671), it has been experimental demonstrated to occur for Bcd-eGFP in the conditions used in Ref. 3 (Reference 30), and the actual diffusion coefficient that should have been extracted from the data presented in Ref. 3 has been recalculated by another group to be instead D = 0.9 micron^2/s (Castle et al., 2011, Cell. Mol. Bioeng. 4:116-121). It would therefore be better to report the corrected value from Castle et al. to help the field converge towards an accurate description of Bcd mobility.

      We fully agree and will use the improved FRAP estimated value for Bcd.

      *Minor comments and suggestions: *

      • 8) Figure 1: From panel A, it seems that what is called "Anterior" and "Posterior" is about 150 micron away from the embryo mid-section, i.e. about 100 micron from either the anterior pole or the posterior pole (so not the tip of the embryo, but somewhere in the anterior half or posterior half). Maybe this should be made clear in the text. *

      We have made changes in Figure 1A to indicate the region within which the FCS measurements are carried out. We have added the relevant details in the legend of figure 1 lines 137-138.

      *9) Fig. 2A; It might be good to put this graph on a log scale, so that cytoplasmic values are seen more clearly. Also, what about reporting on nuclear to cytoplasmic ratios? *

      We will rework on this graph and make necessary changes.

      *10) Fig. 2: It could be interesting to plot D_effective as a function of the measured concentration of Bicoid in different locations, since the (interesting) suggestion is made several time that [Bcd] could the a determinant of the protein mobility. *

      Our work provides an indication that Bcd concentration is connected to the diffusion. We did this by measuring at two locations. To extend this to a rigorous model would require substantial new measurement along the whole length of the embryo. While interesting, this represents a very large investment of time and lies beyond the current manuscript.

      *11) Figure 3B&C: Is the curve for 2-component diffusion (without concentration dependence) for steady-state missing? *

      We will clarify in the revision.

      *12) Lines 78 and 471: What do the authors mean by "new reagents"? The word reagent evokes a chemical reaction, but there are none here. Do the authors mean new constructs? or new mutants? *

      We have changed lines 78 and 479 from “new reagents” to new Bcd mutant eGFP lines”.

      *13) Lines 57-59: Another good reference for FCS measurements performed to study the dynamics of a morphogen (in this case Dpp) is Zhou et al., Curr. Biol. 2012;22(8):668-75 *

      We added this reference in no.70.

      *14) Lines 109-111: A word must be missing. Precisely determined what? *

      Precisely measure within cytoplasm, and nuclear compartments and also during interphase stages. We have changed to “precisely measure in the cytoplasmic and nuclear regions during the interphase stages of nuclear cycles (n.c.)12-14.” in line no.111-112.

      *15) Line 278: The increase in the slow mode is expected. Maybe explicitly mention why. *

      In line 286, we have added “due to the loss of Bcd binding to the DNA”.

      *16) Line 282: "with the fast component increasing", maybe replace with "with the diffusion coefficient of the fast component increasing" or "with the fraction of the fast component increasing". *

      We have changed line 289 “with the diffusion component of fast component increasing towards the posterior”.

      *17) Line 517: Is there a reason why the dorsal surface is always placed in the coverslip? *

      We have added these details in line 528-529 in Methods.

      *18) Line 524 and on: FCS measurements: What was the duration of each individual FCS measurement? It is great that the exact number of measurements are reported in the supplementary! *

      Thank you for the complement. Typically, cytoplasmic measurements are 60secs and nuclear measurements are 20-40s. We have added this in line no.528-529. We also added a column to indicate the duration of each of the measurements in the supplementary tables.

      *19) An Airy unit of 120 um seems large in combination with an objective with a NA of 1.2, is there a reason for that? What was the radius of the resulting detection volume? *

      Olympus microscopes have a 3x magnification stage in their confocals. This leads to the change in the Airy unit. Otherwise, it would be 40 mm.

      *20) Thank you for detailing the reasons behind the choice of excitation power, an important and often omitted details. Where in the excitation path were the values of the laser power measured (before or after the objective?)? *

      Thank you for the complement. The laser power is measured before the objective. We removed the objective and measured the laser power in the objective path.

      *21) Line 585: "since the brightness of eGFP::Bcd..." do the authors mean the molecular brightness of a single eGFP::Bcd molecule, or the total fluorescence signal? *

      It is the total fluorescence signal. We have edited line no.592.

      *22) It would be good for reference to mention the approximate value of the molecular brightness recorded for these eGFP constructs at the laser power used. *

      We will measure and tabulate in the revised manuscript.

      *23) Reference 766: The year (and maybe other things) is missing. *

      We have corrected this reference.

      24) Figure 2 (Methods): The concentrations shown on the figure should be in nM not uM. * * Thanks for noticing – we have changed.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      MAJOR POINTS

      • 1) FCS measurements and fits *
      • a) Please state the duration of each individual FCS measurement. *

      In the cytoplasm, the measurements were carried out for 60 secs and in nuclei it is between 20-40s. We could not measure for 60s in the nuclei as the nuclear position fluctuates from its initial position. We will add another column to indicate the duration of FCS measurements in the supplementary tables.

      b) The authors acknowledge potential issues with fluorophore photophysics and use different lag time ranges for the calibration dye Atto-488 (0.001 ms in Method Fig. 2) and eGFP (0.1 ms in the main figures). Given the strong influence of different parameters on data interpretation and conclusions, Method Fig. 2 should be repeated with purified eGFP. This is particularly relevant for the noisy FCS measurements in posterior regions.

      Performing the experiment with purified eGFP will be a volume calibration. We routinely performed this before each imaging session, and that should be fluorophore independent. As noted by Reviewer 1, it is also important to be clear about background correction. We will provide brightness data for eGFP and background values in the revised manuscript. We can then use this to estimate the corrected concentrations.

      We use 0.1 ms to start, as at that point any contribution from the photo-physics should have decayed (0.1 ms is about 3-5 times the day rate of the photophysical process, Sun et al., Analytical Chem 2015).

      c) Please explain why no data is shown for "AN" around 0.1 ms lag time in Fig. 1B in contrast to all other figures.

      We will add the data for AN from 0.01 in the revised figures.

      d) Please state what the estimated diffusion coefficients with one-component model fits are. Please also explain why the fits in Fig. S1E do not reach a value of 1 and why they plateau higher than the experimental data at long lag times. Please constrain the fits to G=1 at 0.1 ms tau and G=0 at 1 s tau to make a fair comparison.

      The experimental ACF curves reach 0 at long lag times as would be expected. The one-component fits, however, don’t describe the data well and as a result they do not reach 1 and 0 at short and long lag times, respectively. The fitting is done using a mean-squared estimation of the best approximation of the particular model function to the data. Fixing the parameters can be done, but it will further reduce fit accuracy and deviations will be larger. We will perform this analysis and tabulate the one component fits in supplementary 1 with necessary corrections.

      e) Please assess the validity of all multi-component fits by comparing the relative quality of the models to the number of estimated parameters using the Akaike information criterion or similar approaches.

      We will provide the values denoting the quality of the fits in the revision. We will provide the 3D 1 particle fit, the 3D 1 particle fit with triplet, the 3D 2 particle fit and the 3D 2 particle fit with triple and will provide appropriate measures of fit quality.

      f) Please also present the Bcd-GFP fits with 0.001 ms that are mentioned in line 590, and present the results for the data that did not give comparable tau_D1 and tau_D2 values mentioned in line 593.

      We will provide all the curves from 0.001ms in the supplementary. We did not provide these details as we have followed the methods from Abu Arish et al., 2010. As our cytoplasmic and nuclear TauD values match with Abu Arish et al., 2010 and Porcher et al., 2010, we thought the excess data would be redundant.

      3) Bicoid gradient and modeling * a) Little et al. 2011 observed that the Bcd gradient decreases around n.c. 13. Can the authors of the present work observe a similar concentration decrease using FCS? This is important to i) validate the FCS concentration measurements, and ii) to resolve the controversy regarding "previous claims based on imaging the Bcd profile within nuclei, which predicted decrease in Bcd diffusion in later stages".*

      This is a good point regarding conclusions from the previous literature. The Little et al. paper inferred that diffusion had to decrease from fitting to the gradient profiles. However, subsequent analysis from our lab (Durrieu et al., MSB 2018 [which uses a different method involving a tandem reporter for Bicoid] and this manuscript) strongly suggest that Bicoid remains dynamic, at least through n.c. 13 and early n.c. 14. One way to test this is to use SPIM-FCS, where longer time courses can be taken (though with slower time resolution in the FCS). We have performed preliminary experiments with SPIM-FCS and we will revisit this data to see if we can find evidence for changes in the diffusion.

      We will also extend the Discussion to make the results clearer in terms of previous models and literature.

      b) Please explain why the experimental Bcd-GFP gradient data does not reach a value of 1 (e.g. in Fig. 3D) despite normalization. Please also explain why the fits become flatter in Fig. 5B compared to the steep fit in Fig. 3D.

      Both lines were measured under identical conditions. Therefore, we normalised to the maximum value of both experiments. We will redo, normalising to each individual experiment. Regarding Fig. 5C, the Bcd::eGFP curve is identical to Fig. 3D. The flatter curve is the line with eGFP tagged to a NLS alone.

      c) For modeling, please take into account observations that the Bcd source is graded with a wide distribution (30-40% EL, see Spirov et al. 2009, Little et al. 2011, Cai et al. 2017 etc.). The extent of the source used in the present work (x_s=20 um, line 620) is at least five times too small.

      Care must be taken in defining the source extent. The most careful measurements are reported in Little et al., PLoS Biology 2011 who performed single molecule FISH. They conclude “We demonstrate that all but a few mRNA particles are confined to the anterior 20% of the egg”. Further, the peak in the particle density is around 20-30um from the anterior (Figure 3, Little et al., PLoS Biology 2011), with the vast majority of counts being with 10% of the anterior pole. Further, Durrieu et al. MSB 2018, showed using a Bcd tandem reporter that there was unlikely to be an extended gradient of bcd mRNA (maximum extent of around 50um). Here, we used a simple source domain, which was arguable a little narrow, but not significantly so. We will increase the value in the revision, but the claim that there is an extended bcd mRNA gradient (Spirov et al., Development 2009) has not been substantiated by later experiments.

      • d) Please discuss in the paper how well the simulations in Fig. 3B agree with the experimental data.*

      We will provide these details in the revision.

      • e) Please provide a precise estimate for the statement "Even with an effective diffusion coefficient of 7 μm2s-1, few molecules would be expected at the posterior given the estimated Bcd lifetime (30-50 minutes)" to turn this into a quantitative argument. How many molecules are expected to reach posterior in which model, and how does it compare to experimental observations?*

      This can be estimated based on the root-mean-square distance for diffusive processes. We will provide this in the revision.

      • f) The sentence "we find that a model of Bcd dynamics that explicitly incorporates fast and slow forms of Bcd (rather than a single "effective" dynamic mode) is consistent with a range of observations that are otherwise incompatible with the standard SDD model" needs to be toned down and corrected since a simple SDD appears to be sufficient to account for the observed gradients. If the authors disagree, please specifically point out in the paragraph around line 249 what observations exactly are incompatible with a standard SDD model.*

      This is similar to the point raised by Reviewer 1. While the standard SDD model can explain the overall gradient shape, it is not compatible with the observed time scales and Bcd puncta tracked in the posterior pole. We will improve the Discussion around this point to make the distinctions between the models clearer.

      • 5) Data presentation *
      • a) In line 27 and 122 it would be better to rephrase the wording "find/found" and give credit to previous papers that first made these observations. *

      We will edit in the revision.

      • b) For the statement "This suggests that the dynamics of the fast fraction were not captured by previous FRAP measurements", please explain why this should not be the case even though the fast fraction is shown to be larger than the slow fraction in the current work.*

      We will edit in the revision.

      • c) Similarly, the sentence "The dynamics of the slower mode correspond closely to measured Bcd dynamics from FRAP" likely needs to be corrected since it neglects the contribution of the faster mode, which is fluorescent as well and should also contribute to the dynamics from FRAP.*

      This is similar to the point raised by Reviewer 1 and we will edit in the revision.

      d) In the absence of further evidence (see above), the sentences "We establish that such spatially varying differences in the Bcd dynamics are sufficient to explain how Bcd can have a steep exponential gradient in the anterior half of the embryo and yet still have an observable fraction of Bcd near the posterior pole" and "These results explain how a long- ranged gradient can form while retaining a steep profile through much of its range" in the abstract need to be toned down.

      We are not sure here what needs to be toned down. Our results show that there are (at least) two dynamic forms of Bcd and, combined, they are capable of forming a long-ranged gradient while also ensuring the gradient remains steep in the anterior (because the diffusion coefficient itself varies across the embryo). We will go through these statements and make sure the meaning is clear.

      e) The authors state that "However, we show that eGFP::Bcd in its fastest form can move quickly (~18 μm2s-1), and the fraction of eGFP::Bcd in this form increases at lower concentrations", but this has not been directly shown. Please tone down this statement or directly test the prediction that Bcd has a higher fraction of the fast form in earlier nuclear cycles when Bcd concentration is smaller.

      This is a good suggestion, and we will test whether early nuclear cycles of the anterior domain show faster dynamics.

      *MINOR POINTS * * 1) Introduction * * a) Please explain explicitly what exactly the contention in Bcd, Nodal and Wingless dynamics is in the cited references. *

      We will add in the revision. b) In line 95, it would be better to state that this is a variation of the SDD model rather than "a new model". * We changed from “a new model” to “an improved version of SDD model” in the current version of the manuscript. 2) Methods * * a) The authors state that "The same software was also used to calculate the cross-correlation function", but I couldn't find any cross-correlation analyses. Please clarify. *

      It is line 538. There is no cross correlation. We changed this to the autocorrelation function.

      b) Please correct the "uM" typo to "nM" in the legend of Method Fig. 2A.

      We have changed this in the current version.

      • c) In the sentence "Further, since the brightness eGFP:Bcd in the anterior and posterior cytoplasm is lower compared to the nuclei", "brightness" probably needs to be changed to "concentration" since the molecular brightness is unlikely to change. *

      We edited the line no.591.

      • d) Please explain the background-correction method mentioned in line 612. Please also state at what temperature the experiments were performed.*

      We will add a better background correction in the revision. Currently, it is the non-embryo background as background noise. The measurements are carried out at 25oC.

      *3) Results * * a) Please provide labels for anterior, posterior, dorsal and ventral in Fig. 1A. * * b) Please explain the colors in Fig. 5C. * * c) Please explain the dashed lines in Fig. 3C. * We have edited Figure 1A and Figure 5C. We will edit Figure 3C in further revision.

      *OPTIONAL * * 1) If possible, it would be helpful to mention whether the transgenic animals have any abnormal phenotypes or whether they can rescue the bcd mutant. * We will update in the revision.

      *2) To validate the concentration measurements, it would be ideal if the authors could determine the Bcd concentration gradient using FCS along the anterior-posterior axis. This would also address whether there are further unexpected changes in diffusivity in medial regions and along the anterior-posterior axis that would have to be considered for modeling. * To measure the Bcd concentration using FCS along the whole axis would be a very challenging undertaking. To get the data for the two positions analysed already represents a significant amount of work. We have done SPIM-FCS measurements, and we will be repeating our FCS measurements in the Fritzsche lab at Oxford. Combined, we believe this provides sufficient corroboration of our results.

      *3) Local photoconversion experiments, e.g. in Bcd-Dendra2 embryos if available, would provide compelling support for the relevance of the measurements in the current work. * This is a nice idea, but this would represent a substantial project in its own right and lies beyond the current work.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *In my estimation the experimental work is rigorous and the results fully support the conclusions of the authors. I was surprised, however, that the HD-only form localizes via very different and simpler dynamics than does full-length Bcd, but nevertheless forms at least a qualitatively similar gradient. That leads to the question as to whether the existence of the fast and slow forms and their different ratios in different parts of the embryo actually are physiologically relevant. I don't see a straightforward way to test this experimentally, because the mutations that effect Bcd gradient formation also affect essential functions of the protein that if abrogated produce severe downstream effects on embryonic development and lethality. However I would like to see this point at least addressed in the discussion. The data and the methods are presented in such a manner that they can be reproduced, and the number of replicates and statistical analysis is overall robust. * We thank the Reviewer for the positive and constructive review. They, like both previous reviewers, raise the issue of the model and how it fits with the data. As outlined above, we will improve this part of the data presentation and also the Discussion to make sure the main results are clear.

      We agree that the underlying importance of the different dynamic forms of Bicoid – and why they change across the embryo – remains unknown. We believe that our careful characterisation of such behaviour is important nonetheless, as it reveals that: (1) morphogen dynamics are more complicated than typically modelled, and this may be just as relevant for ligands moving through extracellular space; and (2) dynamics can vary in space/time, providing an additional possible mechanism of control for regulating morphogen gradient profiles.

      Of course, we would like to explore potential physiological relevance. Further exploration of the homeodomain and its role in regulating dynamics is a potential route, but that belongs in future work.

      *Minor comments: *

      • The presentation of the graphical data measuring Bcd levels along the a-p axis (Fig 1C, 1D, 4C-F and others) needs to be improved, because the grey lines that represent ACF curves are essentially invisible. This is partly because there is usually extensive overlap between the grey lines and other lines. This may be solved by using a more vivid colour than grey for the ACF curves, or perhaps the ACF lines could be made thicker but with some transparency so that overlapping data can be seen. In any event this aspect of the presentation needs to be improved. * We have made the ACF lines thicker to distinguish from the model fit.

      *In Figs 2D and 2I measurements of statistical significance between the proportion of protein in fast and slow modes need to be added. * We will add in the revision.

      *Relevant to line 174 and Fig 2, NLS should be defined when first used, the source of the NLS should be given (is it from Bcd?) and the rationale for looking at eGFP::NLS should be made explicit. *

      We have added details on how the eGFP::NLS is generated in the methods.

      *In Fig 3D the dashed lines need to be defined. I assume these are experimental error bars but this is not stated. *

      We now state this in the legends.

      *On lines 344-5, shouldn't this conclusion concern the HD rather than the NLS? * Yes, thanks for pointing it out it is related to only NLS not NLSHD. We removed this statement from line 351.

      *On line 432, CAP is not an acronym, the correct term is 5' 'cap' or 'cap structure'. Also Cho et al. PMID 15882623 should be added to the references here. * We changed the corresponding section and added the references.

      *On lines 446, 456, 469, and throughout: replace 'blastocyst' with 'blastoderm'. The former term is generally used for embryos that undergo full cellular divisions and cleavage in early embryogenesis, not for syncytial embryos such as Drosophila. * We have changed blastocyst to blastoderm throughout the manuscript.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Major comments: The averaged autocorrelation curves were fitted to models of diffusion with one and two components. The one-component model was insufficient to reproduce the data and the two-component model seems to fit the data. Have the authors tested models with more than two components? Could it be possible to distinguish more Bcd populations?

      While it is possible to fit with further components, it rarely provides useful further insight. In particular, the error in measuring three tau_D’s is typically very large. In addition, the improvement in the fit will be marginal, and thus the extra components cannot be justified statistically. Of course, we cannot exclude a third (or more) possible dynamic modes, but within the resolution of our FCS measurements two components with triplets are in general the maximum that can be accommodated without overfitting. We will provide evidence for this claim in the supplement of the revised manuscript.

      In Figure 2E, the same concentration of eGFP::NLS is estimated to exist in the cytoplasm and nucleus. Since the NLS should target eGFP to the nucleus, what is the explanation for this observation? Is it possible that the method used to estimate the concentration of molecules is underestimating the concentration in the nucleus or the opposite in the cytoplasm?

      This is a good observation. There are two possible explanations. First, the regular division cycles “reset” the nuclear levels. Therefore, differences may not be so large. Second, FCS measurements of concentration can be noisy, as they depend on the very short time scales in the measurement. We will double check our measurements and clarify this in our revision.

      *In the simulation of the SDD model (Figure 3B), simulations at 10 min, 25 min and 120 min are shown. Assuming that 120 min corresponds to early nc14, are simulations at earlier timepoints corresponding to nc12 and nc13 indistinguishable from the profile at 120 min? This demonstration would further support the option to merge the data from all nuclear cycles. *

      This is a good point. Here, we were primarily focused on showing the time evolution of the model, rather than directly mapping onto experiment. We will clarify in the revision.

      *The results obtained with the BcdN51A mutant show an increase in diffusion speed, while retaining similar proportions of fast and slow populations. In the slow fraction, a new population is found. Assuming that the BcdN51A molecules cannot bind specifically to DNA due to the mutation, what would this newly found population correspond to? Could the authors explore the possibility of nonspecific binding to DNA? The article would also win by discussing more on this aspect or other options. *

      This is an interesting question. Dslow for anterior nuclei of N51A mutants increases (Dslow from ~0.2um2/s to ~1.5 um2/s), and the proportion is similar to the slow fraction of WT Bcd in the anterior nuclei (F=50%). The Dslow values of bcdWT suggest that 0.2um2/s is a result of DNA binding. For bcdN51A, Dslow of 1.5 um2/s is suggestive of nonspecific interaction of bcdN51A to the DNA. Such a nonspecific interaction is also noticed in the case of NLS::eGFP, where we see a significant amount (Dslow~ 1-1.5 um2/s , F=20%) of slow form in the anterior nuclei, likely due to non-specific interaction with the DNA.

      It is worth noting that the inactive homeodomain of transcription factor sex comb reduced (scr) also interacts non-specifically with DNA at high concentration (Vukojevic et al., PNAS 2010). Non-specific interaction of eGFP fluorophore is also noted to be higher in the nuclei of AT-1 cells that suggest “obstacle-free accessible space” is low in the nuclei (Wachsmuth et al., JMB 2000). Therefore, though we do not understand the specific mechanism, our results for N51 mutants are aligned with previous observations of intra-nuclei dynamics.

      The experimental rational behind the BcdMM reporter needs to be better explained as it is not clear. It was previously shown that the N51A mutation disturbs zygotic hb activation and Caudal gradient formation (see Figure 3 in Niessing et al., 2000). Since N51A already causes a strong phenotype by disturbing hb expression and Cad gradient formation, what is the reasoning being adding extra mutations to this background? Since the mutations in the PEST domain and YIRPYL motif are involved in cad translational repression, it would be more interesting to add them to the R54A mutation and further study the repression of cad? It would also shed light on the unexpected no difference or even decrease in diffusion in the cytoplasm of the R54A mutant which should increase if indeed the cad mRNA binding is being repressed.

      Our rationale was to remove more elements of Bcd to see if there was some degree of redundancy – at least in terms of the dynamics.

      The Bicoid homeodomain N51A mutation is physiologically known to cause de-repression of caudal and inhibit hunchback expression. Mechanistically, nuclear Bcd activates hb transcription. However, in the cytoplasm Bcd interacts with other proteins and forms a complex to de-repress caudal. Bcd binds to caudal mRNA through its HD at one end of the complex. However, in the other end, other proteins in the complex are bound to the 5’cap region caudal mRNA. Our rationale for generating the MM mutation was that the N51A mutation may not be sufficient for Bcd to be released from the protein complex. Therefore, additional mutations to N51A may release Bcd from interactions with either DNA or with other proteins through PEST domain and YIRPYL motif.

      *Have the authors confirmed that their BcdR54A indeed inhibits cad translation? *

      We have not tested the eGFP:bcdR54A to inhibit cad translation. We will add the data in the revision.

      *How many embryos of BcdMM were analysed? The authors should also provide a table with all the values in SI as they have done for all the other reporters. *

      We will add this data with the revision.

      *The claims with eGFP::NLSBcdHD need to be supported by data from multiple embryos. Even if multiple ACF curves are obtained from one embryo, analysing only one embryo is not sufficient. This would clarify the fact that this reporter seems to be able to reproduce the mobility of Bcd in the nucleus. *

      We agree and we are arranging to collect more data. This should be completed by the end of the summer.

      *According to the methods, all reporters were expressed in a bcd null background, made with the bcd1 allele. This allele is also known as bcd085 and according to Driever and Nusslein-Volhard, 1988 (PMID: 3383244), this allele only causes an intermediate phenotype. This indicates that a truncated version of the protein probably still exists on the embryo. Do the conclusions obtained here still hold if a truncated version of the Bcd protein exists in addition to their reporters? *

      We used the bcdE1 mutant, a null mutant of bcd. This was used by Gregor et al., Cell 2007 in their generation of the original Bcd::eGFP. We have also recently generated a more complete bcdKO mutation (Huang et al., eLife 2017). Our embryos do not have a clear phenotype that we can relate to the specific bcd- background used. Nonetheless, we agree it is an important point to be clear about the genetic background and we will clarify in the revised manuscript.

      Minor comments: * * In line 45: "Morphogens are signalling molecules", the authors should consider removing the word "signalling" since not all morphogens are, especially the one being studied, Bicoid. * * In lines 80-81 (and also throughout the text): "We measure the Bcd dynamics at multiple locations along the embryo AP-axis", should be more accurate and changed to anterior and posterior of the embryo. Using "multiple locations along the AP axis" is ambiguous and not exact for what was done.

      Yes, this is a fair comment. We have edited these sections in the current manuscript.

      *Throughout the article, the authors refer multiple times to "modes for/of Bcd transport". Since they or others have not proven that Bcd is being transported, which would involve at least another factor, the authors should replace transport by movement, diffusion or a similar word with which they are comfortable. *

      We have changed transport to movement wherever relevant in the text.

      *Suggestion: The authors claim that the Bcd gradient is exponential up to 60% of embryo length. Would this information allow a more precise calculation of the gradient decay length in the exponential region than the 80-100µm stated on line 202? *

      This is an interesting point, but our results suggest that the idea of the decay length is not so applicable in the posterior region. There, the Bcd dynamics are generally quicker, thereby increasing l. Of course, we cannot discount possible spatial variation in degradation. However, in previous work, our Bcd tandem reporter (which is sensitive to changes in degradation) did not reveal spatial variation in degradation.

      In lines 258-259, the sentence "Further, Bcd binds to caudal mRNA, repressing its expression in the cytoplasm" should be improved to clarify the role of Bcd in caudal mRNA translation repression and references should be added. This should also be corrected in the following paragraph.

      We will add the necessary corrections in the revision.

      *In line 262, "mutations" should be singular since it corresponds to only one amino acid mutation. *

      We have corrected this.

      *Figure 4J needs to be corrected as the fractions of the slow and fast populations do not correspond to what is shown in Table 3. For example, Fslow fraction of AC is ~45% in the figure while it is 36% in Table 3. The problem occurs in all fractions. *

      We are sorry there is a mislabelling in the corresponding figure. AN is in the place of AC. We have edited figure 4J and removed the mislabelling.

      *In the discussion, in lines 379-380, "Given the changing fractions of the fast and slow populations in space, the interactions between the populations are likely non-linear". What is the reasoning for non-linearity and not interchangeability? *

      If the interactions between the two populations were linear, then the fraction in each form would be constant across the embryo. Some degree of nonlinearity is required in order to have spatially varying relative populations.

      *In line 432 caudal should be italicized. *

      We have edited this.

      *In the discussion, the authors conclude that "In the nucleus, the two populations can be largely (though not completely) explained by Bcd binding to DNA". The discussion would win by explaining all the possible options. * We will add the necessary changes in the discussion. This is also related to above reviewer comments.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper studies color vision in anemonefish. The central conclusion of the paper is that anemonefish use signals from their UV cones to discriminate colors that would not otherwise be distinguishable; this differs from other fish in which UV cones extend the range of wavelengths of sensitivity but do not add a dimension to color vision. The work fits into a rich history of studies investigating how color vision fits into an animal's ecological niche. My primary concerns regard the microspectrophotometry data from single cones and some aspects of the presentation of the behavioral data.

      Microspectrophotometry

      The spectral properties of the cone types are a key issue for interpreting the results. These were measured using MSP, and fits are shown in Figure 2. The raw data shown in Fig. S1 appears more complicated than indicated in the main text. The templates miss the measurements across broad wavelength bands in each cone type. Particularly concerning is the high UV absorbance across cone types and the long-wavelength absorbance in the UV cone. It is not clear how this picture supports the relatively simple description of cone types and spectral sensitivities given in the main text and which forms the basis of the modeling.

      Microspectrophotometry is an inherently noise-prone measurement technique, particularly for very small photoreceptor outer segments such as that of single cones, which are also difficult to detect as intact, isolated (nonoverlapping) cells. As such, the absorbance curve fitting and derived lambda max (λmax) values should be treated as estimates. The accuracy of these estimates is adequate for this type of study, and visual modelling results have been shown to be robust against small errors (±10 nm λmax) in photoreceptor sensitivity for multiple species [see Lind, O. & Kelber, A. (2009). Vis Res. 49(15), 1939-1947; and Bitton, PP. et al. (2017). PLOS ONE, 12: e0169810]. We consider it highly unlikely that small shifts in cone λmax from measurement error would make a meaningful difference to the colour discrimination thresholds.

      It should be noted that the raw data shown in the original Supplementary Figure 1, included all scans overlain with an average absorbance curve for presentation purposes; however, the actual lambda max values for different cone types were measured and then averaged among individual scans fitted with photopigment absorbance curve templates. For clarity and transparency, we have now provided three multipaned plots (see Figure 1 – figure supplements 1-3) showing the individual pre- and post-bleach scans of absorbance spectra, fitted absorbance curve templates, and R2 values from the best visual pigment template fit.

      It is worth noting that most of the cone absorbance spectra found in our study closely resemble those in λmax and quality to those measured in another anemonefish species (Amphiprion akindynos) [see Supplementary Figure 1 in Stieb S. et al. (2019). Sci Rep. 9, 16459]. These cone λmax values can also be reconciled with previous estimates on opsin λmax based on amino acid sequences and cone opsin expression in the A. ocellaris retina characterised in Mitchell LJ et al. (2021). GBE, 13: evab184.

      Evidence that the unusual long-wavelength absorbance detected in a couple of the single cone (pre-bleach) measurements were not of visual pigment in origin comes from post-bleach scans, which showed their persistence (i.e., did not show a photobleaching response) and were likely instead contaminants (e.g., blood, RPE pigment). UV absorbance in some of the double cone measurements (above that expected of the prebleached beta peak from chromophore spectral absorption) can be attributed to either noise from scans as is quite typical of MSP and/or partial (accidental) bleaching from stray light sources. Although utmost care was taken to minimise contamination and unintended bleaching sometimes it is unavoidable.

      We refer the Reviewer to multiple published studies for further examples of typical MSP measurements that share similar levels of noise to ours e.g., see Figure 1 in Knott B. et al. (2013). JEB, 216:4454-4461; Figure 3 in Schott, RK et al. (2015). PNAS, 113(2): 356-361; Figure 2 in Dalton BE et al. (2014). Proc R Soc B. 281; Figure 5 in Tosetto, JE et al. (2021). Brain Behav Evol. 96: 103-123.

      Presentation

      The results are not presented in a straightforward way - at least for this reviewer. What is missing for me is a clear link between the psychometric curves in Figure 3A and the discrimination thresholds indicated in Figure 3B and Figure 4. Figure 3A is only discussed in the text on line 289 - after Figure 4 has been introduced and discussed. It would have been very helpful for me if the psychometric curves were first introduced and described, then the relation to Figure 3B was clearly indicated (perhaps with a single psychometric curve as an example). Similarly for Figure 4 the relationship between specific psychometric curves and the threshold plotted would be quite helpful. Currently it takes a careful reading to understand why being below the dashed line in Figure 4 is important.

      We have made the following changes, including the introduction of the psychometric curves earlier in the results (lines 236-249) and moved the psychometric function comparison before the mention of Figure 4. Additionally, to make the association between the plotted colour loci and psychometric curves clearer, we have added a smaller psychometric curve plot adjacent to the colour space (in Figure 3B) using red as an example which has an averaged psychometric curve overlying the individual fish curves. The figure caption (lines 250-274) explains that the plotted colour loci and given thresholds are mean values calculated from the individual fish behavioural data.

      We have also added a brief reminder that the theoretical limit of colour discrimination is predicted by the RNL model as 1∆S, where in our task fish should be just able to distinguish targets from grey distractors (see lines 222-224). To clarify, the plotted values in Figure 4B are both the individual fish thresholds (points) and average threshold (black bar) per colour set. The individual threshold values are taken at a correct choice probability of 50% from fitted psychometric curves of fish behavioural performance (shown in Figure 3A).

      RNL model

      The data is fit and interpreted in the context of the receptor noise limited model. The paragraph in the discussion about complementary color pairs suggests that this model is incorrect (text around line 332). Consideration of how the results depend on the RNL model is important, especially given the interpretation here.

      The inability of the RNL model to account for the observed asymmetry between color discrimination thresholds implies that they cannot be solely attributed to photoreceptor noise. We can therefore infer from the asymmetry that thresholds are set by a higher-level process, whether that involves post-receptor processes within the inner retina or in the brain remains to be investigated. As explained in lines 396-397 one possibility is that activation of the UV receptor suppresses noise in the visual pathway or enhances the saliency of colors for anemonefish. The high sensitivity to violet-green, which was found in all six of the fish tested, is consistent with the heightened saliency of this color (lines 397-399).

      Figure 3B

      This is the key figure in the paper. But several issues make seeing the data in this figure difficult. First, the important part of the figure is buried near the origin and hard to see. Can you show a surface that connects the thresholds in the different chromatic directions, or otherwise highlight the regions of discriminable and not discriminable colors?

      See previous comment. In short, we have taken the advice of the Reviewer and added highlighted areas around the regions of discriminable colors in Figure 3B to help visually separate them from the non-discriminable regions of colors (from grey). Additionally, we have added an inset showing an enlarged image of the area surrounding the centre of colour space.

      Reviewer #2 (Public Review):

      Mitchell and colleagues examined the contribution of a UV-sensitive cone photoreceptor to chromatic detection in Amphiprion ocellaris, a type of anemonefish. First, they used biophysical measurements to characterize the response properties of the retinal receptors, which come in four spectrally-distinct subtypes: UV, M1, M2, and L. They then used these spectral sensitivities to construct a 4-dimensional (tetrahedral) color space in which stimuli with known spectral power distributions can be represented according to the responses they elicit in the four cone types. A novel five-LED display was used to test the fish's ability to detect "chromatic" modulations in this color space against a background of random-intensity, "achromatic" distractors that produce roughly equal relative responses in the four cone types. A subset of stimuli, defined by their high positive UV contrast, were more readily detected than other colors that contained less UV information. A well-established model was used to link calculated receptor responses to behavioral thresholds. This framework also enabled statistical comparisons between models with varying number of cone types contributing to discrimination performance, allowing inferences to be drawn about the dimensionality of color vision in anemonefish.

      The authors make a compelling case for how UV light in the anemonefish habitat is likely an important ecological source of information for guiding their behavior. The authors are to be commended for developing an elegant behavioral paradigm to assess visual performance and for incorporating a novel display device especially suited to addressing hypotheses about the role of UV light in color perception. While the data are suggestive of behavioral tetrachromacy in anemonefish, there are some aspects of the study that warrant additional consideration:

      1) One challenge faced by many biological imaging systems is longitudinal chromatic aberration (LCA) - that is, the focal power of the system depends on wavelength. In general, focal power increases with decreasing wavelength, such that shorter wavelengths tend to focus in front of longer wavelengths. In the human eye, at least, this focal power changes nonlinearly with wavelength, with the steepest changes occurring in the shorter part of the visible spectrum (Atchison & Smith, 2005). In the fish eye, where the visible spectrum extends to even shorter wavelengths, it seems plausible that a considerable amount of LCA may exist, which could in turn cause UV-enriched stimuli to be more salient (relative to the distractor pixels) due to differences in perceived focus rather than due solely to differences in their respective spectral compositions. Such a mechanism has been proposed by Stubbs & Stubbs (2016) as a means for supporting "color vision" in monochromatic cephalopods (but see Gagnon et al. 2016). It would be worth discussing what is known about the dispersive properties of the crystalline lens in A. ocellaris (or similar species), and whether optical factors could produce sufficient cues in the retinal image that might explain aspects of the behavioral data presented in the current study.

      This is an interesting point, and we appreciate the reviewer’s thoughtful comment regarding this topic especially as LCA increases exponentially in the UV. Although we certainly cannot disprove such a mechanism in the present study, we are highly sceptical that LCA could be used by reef fish and is involved in the heightened saliency of UV stimuli. Previous work has found that LCA is mostly corrected for in the teleost retina of both marine and freshwater species by graded, multifocal lenses that focus different wavelengths at the same depth as their maximally sensitive cone photoreceptors [e.g., for evidence in African cichlids see Kröger, R. H. H. et al. (1999). J Comp Physiol. A, 184, 361-369; Malkki, P. E. & Kröger, R. H. H. (2005). J Opt. A, 7, 691-700; and for various reef fishes see Karpestam, B. et al. (2007). J Exp Biol., 210, 16: 2923-2931]. In essence, LCA is corrected in the eyes of many teleosts by accurately tuning longitudinal spherical aberration through having a graded density lens. We draw particular attention to the latter reference which comparatively examined the optical properties of reef fish lenses, including diurnal, planktivorous damselfishes (from the same family as anemonefishes, Pomacentridae). They found that not only were the lenses of these species highly UV-transmissive (as we show in anemonefish), but all were multifocal and capable of focusing both visible (non-UV) and UV wavelengths. Considering the coastal cephalopod species examined thus far, all of them contain only one type of visual pigment which is packed in their long photoreceptor (150-450µm long outer segment) across an entire retina (Chung and Marshall 2016, Proceeding B). Theoretically, given these long photoreceptors, the LCA and the resulting differentials of focal length onto different patches of photoreceptors or different depth of the outer segment might provide cues for colour discrimination even though no behavioural evidence exists to prove this hypothesis yet. Unlike the cephalopod case, the four specific spectral cones arranged in a mosaic pattern along with their very short outer segments (5-10µm) in the anemonefish retina likely makes the LCA less effective in this retinal design.

      We have added a short paragraph (Lines 400-412) discussing the possibility of an optical mechanism contributing to heightened UV saliency with a particular focus on LCA and our thoughts on why we consider it an unlikely mechanism in anemonefish.

      2) The authors provide a quantitative description of anemonefish visual performance within the context of a well-developed receptor-based framework. However, it was less clear to me what inferences (if any) can be drawn from these data about the post-receptoral mechanisms that support tetrachromatic color vision in these organisms. Would specific cone-opponent processes account for instances where behavioral data diverged from predictions generated with the "receptor noise limited" model described in the text? The general reader may benefit from more discussion centered on what is known (or unknown) about the organization of cone-opponent processing in anemonefish and related species.

      In short, we do not know the specific opponent interactions of anemonefish cones. The RNL model assumes all possible opponent interactions in its calculations. From our results, very little can be said about the post-receptor mechanisms involved in their putative tetrachromatic vision. We would like to avoid overreaching beyond what our data can show. A future directions section has now been added to the discussion (lines 467-497), which briefly mentions the known UV opponency in larval zebrafish and that future investigation in anemonefish should attempt to disentangle the specific opponent (chromatic) and non-opponent (achromatic) circuits in the anemonefish retina.

      Reviewer #3 (Public Review):

      The comments below focus mainly on ways that the data and analysis as currently present do not to this reviewer compel the conclusions the authors wish to draw. It is possible that further analysis and/or clarification in the presentation would more persuasively bolster the authors' position. It also seems possible that a presentation with more limited conclusions but clarity on exactly what has been demonstrated and where additional future work is needed would make a strong contribution to the literature.

      • Fig 3A. It might be worth emphasizing a bit more explicitly that the x-axis (delta S) is the result of a model fit to the data being shown, since this then means that if RNL model fit the data perfectly, all of the thresholds would fall at deltaS = 1. They don't, so I would like to see some evaluation from the authors' experience with this model as to whether they think the deviations (looks like the delta S range is ~0.4 to ~1.6 in Figure 4B) represent important deviations of the data from the model, the non-significant ANOVA notwithstanding. For example, Figure 4B suggests that the sign of the fit deviations is driven by the sign of the UV contrast and that this is systematic, something that would not be picked up by the ANOVA. Quite a bit is made of the deviations below, but that the model doesn't fully account for the data should be brought out here I think. As the authors note elsewhere, deviations of the data from the RNL model indicate that factors other than receptor noise are at play, and reminding the reader of this here at the first point it becomes clear would be helpful.

      We have now stated more explicitly in the figure caption for Figure 3A, that the delta S values presented were calculated by fitting fish behavioral data to the RNL model. To test the overall effect that the sign of the UV contrast had on the discrimination threshold, we have now included ‘contrast’ (positive or negative) as another fixed effect in the linear mixed effects model. We have now included details of this test in the results which shows the systematic effect (lines 338-340). Additionally, as suggested we now briefly introduce in the results the idea that factors other than receptor noise are causing the observed deviations in data from the RNL model.

      • Line 217 ff, Figure 4, Supplemental Figure 4). If I'm understanding what the ANOVA is telling us, it is that the deviations of the data across color directions and fish (I think these are the two factors based on line 649) is that the predictions deviate significantly from the data, relative to the inter-fish variability), for the trichromatic models but not the tetrachromatic model. If that's not correct, please interpret this comment to mean that more explanation of the logic of the test would be helpful.

      The interpretation of the ANOVA by the Reviewer is mostly correct. We had the variables color set and Fish ID, with threshold delta S as the dependent variable. This showed that deviations from the predicted threshold were significant relative to the inter-fish variability for the trichromatic models. Missing details describing the ANOVA have now been added to the methods (lines 789-798).

      Assuming that the above is right about the nature of the test, then I don't think the fact that the tetrachromatic model has an additional parameter (noise level for the added receptor type) is being taken into account in the model comparison. That is, the trichromatic models are all subsets of the tetrachromatic model, and must necessarily fit the data worse. What we want to know is whether the tetrachromatic model is fitting better because its extra parameter is allowing it to account for measurement noise (overfitting), or whether it is really doing a better job accounting for systematic features of the data. This comparison requires some method of taking the different number of parameters into account, and I don't think the ANOVA is doing that work. If the models being compared were nested linear models, than an F-ratio test could be deployed, but even this doesn't seem like what is being done. And the RNL model is not linear in its parameters, so I don't think that would be the right model comparison test in any case.

      Typical model comparison approaches would include a likelihood ratio test, AIC/BIC sorts of comparisons, or a cross-validation approach.

      If the authors feel their current method does persuasively handle the model comparison, how it does so needs to be brought out more carefully in the manuscript, since one of the central conclusions of the work hinges at least in part on the appropriateness of such a statistical comparison.

      Our visual model comparisons were aimed at assessing whether a trichromatic or tetrachromatic model best fit the colour discrimination data. The trichromatic and tetrachromatic models assume two and three opponency pathways, respectively. If the fish were not tetrachromatic, and instead trichromatic, then we would expect that the RNL model should better fit the data with two opponency mechanisms (rather than three). Our reason for making this assessment, is because of the possibility that not all the cones could be contributing to colour vision and could be used exclusively for achromatic tasks (e.g., luminance vision or motion detection). However, according to our finding that the data best fit the tetrachromatic model (i.e., how the behavioural discrimination thresholds more closely fitted the theoretical prediction of 1∆S), it is likely that anemonefish used all four cones for colour vision.

      We have also now repeated our analysis using unweighed delta S values which are calculated using general n-dimensional models of colour vision (using the PAVO2 package). These models essentially follow the same initial steps followed by the RNL model (and many others) but omit the receptor noise correction stage. After comparing (using ANOVA, see lines 303-311) the predicted thresholds with the data in this non-RNL space, it was found that again the tetrachromatic model predictions did not deviate significantly from the data relative to individual fish performance; however, we also found that the trichromatic model without M2 cone input no longer differed from the predicted values. In this case, it seems that the extra noise parameter did contribute to the difference in fit. Whether this is a biologically meaningful comparison (as all photoreceptors contain noise) is an open question. We have added a short statement explicitly framing our interpretation of anemonefish having a 3-D colour space to being in accordance with the closeness of RNL model predictions (lines 370-371, 506-508).

      • Also on the general point on conclusions drawn from the model fits, it seems important to note that rejecting a trichromatic version of the RNL model is not the same as rejecting all trichromatic models. For example, a trichromatic model that postulates limiting noise added after a set of opponent transformations will make predictions that are not nested within those of RNL trichromatic models. This point seems particularly important given the systematic failures of even the tetrachromatic version of the RNL model.

      This is a good point. We have limited our conclusions to specifically address trichromatic models generated within the framework of the RNL model by adding in the conclusion section that fish psychophysical thresholds were best explained by the RNL model when all four cone types contributed to colour vision (see lines 370-371, 506-508). In this same sentence, we have also added in parentheses that “suggesting (but not proving) tetrachromacy” (line 508). We have also edited the abstract to state that our results were “…best described by a tetrachromatic model using all four cone types…”, rather than stating we have shown tetrachromacy (lines 36-37).

      • More generally, attempts to decide whether some human observers exhibit tetrachromacy have taught us how hard this is to do. Two issues, beyond the above, are the following. 1) If the properties of a trichromatic visual system vary across the retina, then by imaging stimuli on different parts of the visual field an observer can in principle make tetrachromatic discriminations even though visual system is locally trichromatic at each retinal location. 2) When trying to show that there is no direction in a tetrachromatic receptor space to which the observer is blind, a lot of color directions need to be sampled. Here, 9 directions are studied. Is that enough? How would we know? The following paper may be of interest in this regard: Horiguchi, Hiroshi, Jonathan Winawer, Robert F. Dougherty, and Brian A. Wandell. "Human trichromacy revisited." Proceedings of the National Academy of Sciences 110, no. 3 (2013): E260-E269. Although I'm not suggesting that the authors conduct additional experiments to try to address these points, I do think they need to be discussed. We agree with the reviewer, that colour discriminability achieved by tetrachromatic vision could in theory be achieved by the combined effect of localised, distinct forms of trichromacy. Evidence in other fishes suggests that such multiple forms of trichromacy across the retina likely exist in many species. However, the behavioural effects of this retinal setup remain to be studied likely due to its extremely difficult nature. We have added a new section titled “future directions” (Lines 474-489), in which we discuss the possibility that distinct forms of trichromacy in the anemonefish retina could in theory achieve colour discrimination on par with tetrachromatic vision. We also give suggestions on how this could be investigated.

      Although we tried to include as many colour directions as practically possible in our experiment, we have certainly not provided an exhaustive range that completely encompasses anemonefish colour space. Whether 9 colour directions are adequate to assess the dimensionality of their color vision is difficult to say. As addressed in the previous comment, we now acknowledge this limitation by refining our conclusion, saying that our results do not prove tetrachromacy.

      • Line 277 ff. After reading through the paper several times, I remain unsure about what the authors regard as their compelling evidence that the UV cone has a higher sensitivity or makes an omnibus higher contribution to sensitivity than other cones (as stated in various forms in the title, Lines 37-41, 56-57, 125, 313, 352 and perhaps elsewhere).

      At first, I thought they key point was that the receptor noise inferred via the RNL model as slightly lower (0.11) for the UV cone than for the double cones (0.14). And this is the argument made explicitly at line 326 of the discussion. But if this is the argument, what needs to be shown is that the data reject a tetrachromatic version of the RNL model where the noise value of all the cones is locked to be the same (or something similar), with the analysis taking into account the fewer parametric degrees of freedom where the noise parameters are so constrained. That is, a careful model comparison analysis would be needed. Such an analysis is not presented that I see, and I need more convincing that the difference between 0.11 and 0.14 is a real effect driven by the data. Also, I am not sanguine that the parameters of a model that in some systematic ways fails to fit the data should be taken as characterizing properties of the receptors themselves (as sometimes seems to be stated as the conclusion we should draw).

      We have performed various modelling scenarios where receptor noise was adjusted for each channel; however, the UV channel was consistently found to be more sensitive than the other channels. In (the original) Supplementary Figure 6 (now Figure 4 – figure supplements 1 and 2), we show predicted dS values calculated using receptor noise levels in the exact manner that the Reviewer suggests by ranging from 0.05 to 0.15, and most importantly, included scenarios where receptor noise was held equal across cone types and others where it was varied between single cones and double cones. None of the models adjusted the data so that sensitivity was equal across all four channels, which means that by an unknown mechanism, the UV channel is more sensitive, but this is unrelated to noise levels. Our best-fit receptor noise values of 0.11 (for single cones) and 0.14 (for double cones) are estimate values and should be treated as such till actual receptor noise measurements are made.

      Then, I thought maybe the argument is not that the noise levels differ, but rather that the failures of the model are in the direction of thresholds being under predicted for discriminations that involve UV cone signals. That's what seems to be being argued here at lines 277 ff, and then again at lines 328 ff of the discussion. But then the argument as I read it more detail in both places switches from being about the UV cones per se to being about postive versus negative UV contrast. That's fine, but it's distinct from an argument that favors omnibus enhanced UV sensitivity, since both the UV increments and decrements are conveyed by the UV cone; it's an argument for differential sensitivity for increments versus decrements in UV mediated discriminations. The authors get to this on lines 334 of the discussion, but if the point is an increment/decrement asymmetry the title and many of the terser earlier assertions should be reworked to be consistent with what is shown.

      To clarify our argument, we found that the colour discrimination thresholds were systematically lower than predicted by the RNL model for colours which elicited higher UV cone stimulation relative to other cone types. These colours we refer to as UV positive based on the sign direction of their contrast against grey distractors produced by higher UV/V LED channel (i.e., in a positive direction). Whereas colours with UV negative chromatic contrast had lower UV cone stimulation relative to the other cone types. Therefore, our interpretation of the importance of UV cone signals for colour discrimination are congruent with the results. In the discussion, we suggest a possibility that activation of the UV receptor suppresses noise downstream in the visual pathway or enhances the saliency of colours (see lines 397-398). This activation of the UV receptor would, of course, be at its highest for colours with positive UV chromatic contrast.

      Note that we have added to the discussion the possibility that colour preferences or a difference in attentiveness might have contributed to differences in discrimination thresholds (see discussion lines 412-413, 427-428, 433-435, 456-466, and 469-473). However, we consider it a less likely explanation due to a couple of reasons, including 1) a lack of difference in responsiveness across colour sets in their timing to peck the target, and 2) any non-learnt bias would have likely been overridden or at least weakened by training prior to the experiment where colours were rewarded equally (see lines 462-466).

      We have edited the results (lines 334-352) to make our point clearer and by changing the subtitle to be more explicit: “Lower discrimination thresholds induced by positive UV contrast”. The subsection begins by explaining the different types of UV chromatic contrast by elevation angle and, finally, how this division among colour sets was a major determinant of colour discrimination thresholds.

      Perhaps the argument with respect to model deviations and UV contrast independent of sign could be elaborated to show more systematically that the way the covariation with the contrasts of the other cone stimulations in the stimulus set goes, the data do favor deviations from the RNL in the direction of enhanced sensitivity to UV cone signals, but if this is the intent I think the authors need to think more about how to present the data in a manner that makes it more compelling than currently, and walk the reader carefully through the argument.

      We have added to the results the linear mixed-effects model output with ‘contrast’ (positive/negative) added as a fixed effect. This analysis shows that the sign direction of UV contrast was a strong predictor of threshold (see address to previous comments and lines 399-401, 790-799).

      • On this point, if the authors decide to stick with the enhanced UV sensitivity argument in the revision, a bit more care about what is meant by "the UV cone has a comparatively high sensitivity (line 313 and throughout)" needs more unpacking. If it is that these cones have lower inferred noise (in the context of a model that doesn't account for at least some aspects of the data), is this because of properties of the UV cones, or the way that post-receptoral processing handles the signals from these cones mimicking a cone effect in the model. And if it is thought that it is because of properties of the cones, some discussion of what those properties might be would be helpful. As I understand the RNL model, relative numbers of cones of each type are taken into account, so it isn't that. But could it be something as simple as higher photopigment density or larger entrance aperture (thus more quantum catches and higher SNR)?

      It is unknown what aspect of the cone morphology or physiology sets the activation or inactivation threshold. Electrophysiological data collected from the UV cones of other fish species e.g., in goldfish and zebrafish [see Hawryshyn & Beauchamp (1985). 25, Vis Res.; and Yoshimatsu et al. (2020). 107, Neuron.] show that they have exceptionally high sensitivity. What has not been shown is that having a UV cone can improve colour discrimination.

      Previous quantitative cone opsin gene expression analysis showed that the single cone opsins (SWS1 and SWS2B) are expressed at lower levels than all double cone opsin genes. This difference in expression combined with the smaller size of single cone outer segments than the double cones make it unlikely that a larger photoreceptor size, higher volume or packing density of visual pigment is responsible. Contrary to our findings, these aspects of the different cone types (if they had an effect) would instead predict that double cones have a higher SNR, and non-UV colours would be more discriminable. We have now added these details to the discussion (see lines 391-397).

      • Line 288 ff. The fact that the slopes of the psychometric functions differed across color directions is, I think, a failure of the RNL model to describe this aspect of the data, and tells us that a simple summary of what happens for thresholds at delta S = 1 does not generalize across color directions for other performance levels. Since one of the directions where the slope is shallower is the UV direction, this fact would seem to place serious limits on the claim that discrimination in the UV direction is enhanced relative to other directions, but it goes by here without comment along those lines. Some comment here, both about implications for fit of RNL model and about implications for generalizations about efficacy of UV receptor mediated discrimination and UV increment/decrement asymmetries, seems important.

      The variation in the psychometric functions is difficult to interpret and cannot be explained by the RNL model. What the RNL model predicts is delta S based on low level factors (namely receptor noise). In the discussion, we completely agree with the notion that the asymmetry in thresholds from predicted values, and the variation in psychometric slopes cannot be explained by the RNL model, e.g., this is heavily implied by “colour discrimination thresholds cannot be directly attributed to noise in the early stages of the visual pathway…” (lines 388-390). To clarify the inability of the RNL model to account for this aspect of the data, we have included a statement (see line 390).

      It is a good point that this could be an indication of heterogeneity in colour space. Heterogeneity in discrimination thresholds across animal colour space (both surrounding the threshold area and for more saturated regions) has been explored in detail using trichromatic triggerfish by Green N. F. et al. (2022). JEB, 7(225):jeb243533. We have added this idea to the discussion (see lines 490-498). For UV, it seems that two of the five fish (#34 and 20) had noticeably shallower curves than the others tested for UV (fish #19, 33, 36). Both also varied more in their ability to distinguish targets, as shown by their wider confidence intervals. One of these two fish (#34) was retested for UV at the end of the experiment, and in the secondary assessment had a steeper psychometric curve more in line with the other fish in the experiment (see Figure 3 – figure supplement 1 and added lines 247-250). Based on this discrepancy in performance between assessments, it is also possible that individual learning effects had a role in impacting the shape of the psychometric curve. Note, this had minimal effect on colour discrimination thresholds and any differences were in the direction of change observed across colour sets in the experiment (i.e., lower dS for UV positive directions).

      • Line 357 ff. Up until this point, all of the discussion of differences in threshold across stimulus sets has been in terms of sensitivity. Here the authors (correctly) raise the possibility that a difference in "preference" across stimulus sets could drive the difference in thresholds as measured. Although the discussion is interesting and germaine, it does to some extent further undercut the security of conclusions about differential sensitivity across color directions relative to the RNL model predictions, and that should be brought out for the reader here. The authors might also discuss about how a future experiment might differentiate between a preference explanation and a sensitivity explanation of threshold differences.

      We have now added a paragraph (see lines 469-473) discussing that future work should test for color preferences and suggest how this could be done using a similar foraging task. We also include our thoughts immediately prior on why it is unlikely that a colour preference was a major contribution towards the results. In short, we consider it unlikely as fish showed no evidence of reduced latency for pecking at targets across the colour sets and because the training regime prior to the experiment equally rewarded fish for all colours and would likely have overridden a strong preference (at least in this specific foraging context).

      • RNL model. The paper cites a lot of earlier work that used the RNL model, but I think many readers will not be familiar with it. A bit more descriptive prose would be helpful, and particularly noting that in the full dimensional receptor space, if the limiting noise at the photoreceptors is Gaussian, then the isothreshold contour will be a hyper-ellipsoid with its axes aligned with the receptor directions.

      There is now added explanation of the RNL model (see lines 141-151), particularly on its assumptions that it only receives chromatic input and that discrimination is limited by noise arising in the photoreceptors and not by any specific opponent mechanisms. We also added the mention of the expected hyper-ellipsoid shape of isothreshold contours if receptor noise is Gaussian. Note, while we appreciate the importance of the reader to understand the basic functionality of the model, we wanted to avoid overloading the introduction with details on the RNL model which is not the focus of the paper. The RNL model is well-established in the field of visual ecology and animal vision research for well over a decade and has been thoroughly dissected by previous methodological reviews. We refer to one of these more recent reviews by Olsson et al. (2018) Behav Ecol. 29(2):273-282, and direct the reader to the methods section for further details on the RNL model.

      • Use of cone isolating stimuli? For showing that all four cone classes contribute to what the authors call color discrimination, a more direct approach would seem to be to use stimuli that target stimulation of only one class of cone at a time. This might require a modified design in which the distractors and target were shown against a uniform background and approximately matched in their estimated effect on a putative achromatic mechanism. Did the authors consider this approach, and more generally could they discuss what they see as its advantages and disadvantages for future work.

      The Reviewer is correct in that a targeted approach of isolated cone stimulation would be the optimal approach to demonstrating tetrachromatic colour vision. However, the extreme spectral overlap in the absorption curves of anemonefish cones, particularly in the mid-wavelength region makes this problematic in using the current LED display. We added to the discussion ways that this could be studied in the future (see lines 474-489). This might be possible (but still challenging) using a monochromator, but such technology severely limits the diversity of stimuli which can be created and usually restricts experiments to a simple paired choice design (or grey card experiment). The traditional paired choice experiment requires animals to be trained to distinguish a specific colour, while the Ishihara-like task trains animals to distinguish targets using an odd-one-out approach. This latter approach is highly efficient, as it does not require retraining when testing a new colour (i.e., fish learnt the task not a specific colour). Here, we wanted to assess colour discrimination in multiple directions to compare performance, and the flexible LED display combined with a generalisable task was important.

      The above assumes that anemonefish do not use multiple trichromatic systems. In which case, the use of standard experimental stimuli (e.g., a monochromator, an LED display) would be unsuitable as they illuminate the whole retina. To definitively test the range of opponent interactions, it would be necessary to make electrophysiological measurements targeting the transmitting neurons using a retinal multielectrode array (MEA) approach or by in-vivo calcium imaging (lines 484-486).

      We understand that our results are not a direct test of the dimensionality of anemonefish colour vision and should not be interpreted as such, as we do not have direct evidence of tetrachromacy. To recognize this limitation of our data, we have drawn back some of our conclusive statements that claimed to have demonstrated tetrachromacy.

    2. Reviewer #3 (Public Review):

      The comments below focus mainly on ways that the data and analysis as currently present do not to this reviewer compel the conclusions the authors wish to draw. It is possible that further analysis and/or clarification in the presentation would more persuasively bolster the authors' position. It also seems possible that a presentation with more limited conclusions but clarity on exactly what has been demonstrated and where additional future work is needed would make a strong contribution to the literature.

      * Fig 3A. It might be worth emphasizing a bit more explicitly that the x-axis (delta S) is the result of a model fit to the data being shown, since this then means that if RNL model fit the data perfectly, all of the thresholds would fall at deltaS = 1. They don't, so I would like to see some evaluation from the authors' experience with this model as to whether they think the deviations (looks like the delta S range is ~0.4 to ~1.6 in Figure 4B) represent important deviations of the data from the model, the non-significant ANOVA notwithstanding. For example, Figure 4B suggests that the sign of the fit deviations is driven by the sign of the UV contrast and that this is systematic, something that would not be picked up by the ANOVA. Quite a bit is made of the deviations below, but that the model doesn't fully account for the data should be brought out here I think. As the authors note elsewhere, deviations of the data from the RNL model indicate that factors other than receptor noise are at play, and reminding the reader of this here at the first point it becomes clear would be helpful.

      * Line 217 ff, Figure 4, Supplemental Figure 4). If I'm understanding what the ANOVA is telling us, it is that the deviations of the data across color directions and fish (I think these are the two factors based on line 649) is that the predictions deviate significantly from the data, relative to the inter-fish variability), for the trichromatic models but not the tetrachromatic model. If that's not correct, please interpret this comment to mean that more explanation of the logic of the test would be helpful.

      Assuming that the above is right about the nature of the test, then I don't think the fact that the tetrachromatic model has an additional parameter (noise level for the added receptor type) is being taken into account in the model comparison. That is, the trichromatic models are all subsets of the tetrachromatic model, and must necessarily fit the data worse. What we want to know is whether the tetrachromatic model is fitting better because its extra parameter is allowing it to account for measurement noise (overfitting), or whether it is really doing a better job accounting for systematic features of the data. This comparison requires some method of taking the different number of parameters into account, and I don't think the ANOVA is doing that work. If the models being compared were nested linear models, than an F-ratio test could be deployed, but even this doesn't seem like what is being done. And the RNL model is not linear in its parameters, so I don't think that would be the right model comparison test in any case.

      Typical model comparison approaches would include a likelihood ratio test, AIC/BIC sorts of comparisons, or a cross-validation approach.

      If the authors feel their current method does persuasively handle the model comparison, how it does so needs to be brought out more carefully in the manuscript, since one of the central conclusions of the work hinges at least in part on the appropriateness of such a statistical comparison.

      * Also on the general point on conclusions drawn from the model fits, it seems important to note that rejecting a trichromatic version of the RNL model is not the same as rejecting all trichromatic models. For example, a trichromatic model that postulates limiting noise added after a set of opponent transformations will make predictions that are not nested within those of RNL trichromatic models. This point seems particularly important given the systematic failures of even the tetrachromatic version of the RNL model.

      * More generally, attempts to decide whether some human observers exhibit tetrachromacy have taught us how hard this is to do. Two issues, beyond the above, are the following. 1) If the properties of a trichromatic visual system vary across the retina, then by imaging stimuli on different parts of the visual field an observer can in principle make tetrachromatic discriminations even though visual system is locally trichromatic at each retinal location. 2) When trying to show that there is no direction in a tetrachromatic receptor space to which the observer is blind, a lot of color directions need to be sampled. Here, 9 directions are studied. Is that enough? How would we know? The following paper may be of interest in this regard: Horiguchi, Hiroshi, Jonathan Winawer, Robert F. Dougherty, and Brian A. Wandell. "Human trichromacy revisited." Proceedings of the National Academy of Sciences 110, no. 3 (2013): E260-E269. Although I'm not suggesting that the authors conduct additional experiments to try to address these points, I do think they need to be discussed.

      * Line 277 ff. After reading through the paper several times, I remain unsure about what the authors regard as their compelling evidence that the UV cone has a higher sensitivity or makes an omnibus higher contribution to sensitivity than other cones (as stated in various forms in the title, Lines 37-41, 56-57, 125, 313, 352 and perhaps elsewhere).

      At first, I thought they key point was that the receptor noise inferred via the RNL model as slightly lower (0.11) for the UV cone than for the double cones (0.14). And this is the argument made explicitly at line 326 of the discussion. But if this is the argument, what needs to be shown is that the data reject a tetrachromatic version of the RNL model where the noise value of all the cones is locked to be the same (or something similar), with the analysis taking into account the fewer parametric degrees of freedom where the noise parameters are so constrained. That is, a careful model comparison analysis would be needed. Such an analysis is not presented that I see, and I need more convincing that the difference between 0.11 and 0.14 is a real effect driven by the data. Also, I am not sanguine that the parameters of a model that in some systematic ways fails to fit the data should be taken as characterizing properties of the receptors themselves (as sometimes seems to be stated as the conclusion we should draw).

      Then, I thought maybe the argument is not that the noise levels differ, but rather that the failures of the model are in the direction of thresholds being under predicted for discriminations that involve UV cone signals. That's what seems to be being argued here at lines 277 ff, and then again at lines 328 ff of the discussion. But then the argument as I read it more detail in both places switches from being about the UV cones per se to being about postive versus negative UV contrast. That's fine, but it's distinct from an argument that favors omnibus enhanced UV sensitivity, since both the UV increments and decrements are conveyed by the UV cone; it's an argument for differential sensitivity for increments versus decrements in UV mediated discriminations. The authors get to this on lines 334 of the discussion, but if the point is an increment/decrement asymmetry the title and many of the terser earlier assertions should be reworked to be consistent with what is shown.

      Perhaps the argument with respect to model deviations and UV contrast independent of sign could be elaborated to show more systematically that the way the covariation with the contrasts of the other cone stimulations in the stimulus set goes, the data do favor deviations from the RNL in the direction of enhanced sensitivity to UV cone signals, but if this is the intent I think the authors need to think more about how to present the data in a manner that makes it more compelling than currently, and walk the reader carefully through the argument.

      * On this point, if the authors decide to stick with the enhanced UV sensitivity argument in the revision, a bit more care about what is meant by "the UV cone has a comparatively high sensitivity (line 313 and throughout)" needs more unpacking. If it is that these cones have lower inferred noise (in the context of a model that doesn't account for at least some aspects of the data), is this because of properties of the UV cones, or the way that post-receptoral processing handles the signals from these cones mimicking a cone effect in the model. And if it is thought that it is because of properties of the cones, some discussion of what those properties might be would be helpful. As I understand the RNL model, relative numbers of cones of each type are taken into account, so it isn't that. But could it be something as simple as higher photopigment density or larger entrance aperture (thus more quantum catches and higher SNR)?

      * Line 288 ff. The fact that the slopes of the psychometric functions differed across color directions is, I think, a failure of the RNL model to describe this aspect of the data, and tells us that a simple summary of what happens for thresholds at delta S = 1 does not generalize across color directions for other performance levels. Since one of the directions where the slope is shallower is the UV direction, this fact would seem to place serious limits on the claim that discrimination in the UV direction is enhanced relative to other directions, but it goes by here without comment along those lines. Some comment here, both about implications for fit of RNL model and about implications for generalizations about efficacy of UV receptor mediated discrimination and UV increment/decrement asymmetries, seems important.

      * Line 357 ff. Up until this point, all of the discussion of differences in threshold across stimulus sets has been in terms of sensitivity. Here the authors (correctly) raise the possibility that a difference in "preference" across stimulus sets could drive the difference in thresholds as measured. Although the discussion is interesting and germaine, it does to some extent further undercut the security of conclusions about differential sensitivity across color directions relative to the RNL model predictions, and that should be brought out for the reader here. The authors might also discuss about how a future experiment might differentiate between a preference explanation and a sensitivity explanation of threshold differences.

      * RNL model. The paper cites a lot of earlier work that used the RNL model, but I think many readers will not be familiar with it. A bit more descriptive prose would be helpful, and particularly noting that in the full dimensional receptor space, if the limiting noise at the photoreceptors is Gaussian, then the isothreshold contour will be a hyper-ellipsoid with its axes aligned with the receptor directions.

      * Use of cone isolating stimuli? For showing that all four cone classes contribute to what the authors call color discrimination, a more direct approach would seem to be to use stimuli that target stimulation of only one class of cone at a time. This might require a modified design in which the distractors and target were shown against a uniform background and approximately matched in their estimated effect on a putative achromatic mechanism. Did the authors consider this approach, and more generally could they discuss what they see as its advantages and disadvantages for future work.

    1. Author Response

      Reviewer #1 (Public Review):

      Precise regulation of gamete fusion ensures that offspring will have the same ploidy as the parents. However, breaking this regulation can be useful for plant breeding. Haploid induction followed by chemical-induced genome doubling can be used to fix desirable genotypes, while triparental hybrids where two sperm cells with two different genotypes fertilize an egg cell can be advantageous for bypassing hybridization barriers to create interspecies hybrids with increased fitness. This manuscript follows up on a previous study from the same research group that used a clever high throughput polyspermy detection assay (HIPOD) to show that wild-type Arabidopsis naturally forms triparental hybrids at very low frequencies (less than 0.05% of progeny) and that these triparental hybrids can bypass dosage barriers in the endosperm (Nakel, et al., 2017). Mao and co-authors hypothesized that mutants that conferred polytubey, the attraction of multiple pollen tubes by mutant female gametophytes, would also increase the rate of triparental hybrids. They used a double mutant in the endopeptidase genes ECS1 and ECS2 which had previously been reported to induce supernumerary pollen tube attraction to test this hypothesis with their two-component HIPOD system in which one pollen donor constitutively expresses the mGAL4-VP16 transcription factor while the second pollen donor carries an herbicide resistance gene regulated by the GAL4-responsive UAS promoter. Triparental hybrids are detected as herbicide-resistant progeny from wild-type Arabidopsis flowers that have been pollinated by the two paternal genotypes. The authors convincingly show that the ecs1 ecs2-1 double mutant more than doubled the frequency of triparental, triploid hybrids in HIPOD crosses. They next tested the hypothesis that this increase in triparental hybrids was due to a gametophytic effect by using an ecs1-/- ecs2-1/ECS2 maternal parent in the HIPOD assay and testing whether the ecs2-1 mutant allele was preferentially inherited in triparental hybrids. The mutant allele was inherited at a much higher rate than expected, confirming their hypothesis.

      The triparental hybrid results with the ecs1 ecs2 mutant were not that surprising since the presence of extra sperm cells gives more opportunities for triparental hybrids to form, especially if gamete fusion is misregulated. However, an unexpected result came when the authors used aniline blue staining to analyze the ecs1 ecs2 polytubey phenotype. They confirmed that the double mutant had increased levels of polytubey compared to wild-type ovules, but they also noticed that 13% of seeds were not developing normally. This phenotype was confirmed with a second ecs2 allele and was complemented with both ECS1 and ECS2 transgenes under their native promoters. Microscopic analysis revealed normal gametophyte morphology before fertilization, but 8% of pollinated ovules failed to develop an embryo and 7% failed to develop endosperm, suggesting single fertilization events. In a logical set of experiments, they followed up on this result by crossing ecs1 ecs2 with pollen carrying a fluorescent reporter that would be expressed in developing embryos and endosperm. In this experiment, they were again surprised. Some of the wild-type-looking seeds lacked a paternal contribution (i.e. no fluorescent signal from the paternal reporter construct) in the embryo. This prompted them to look more closely at the progeny, upon which they detected small plants that were haploid. They confirmed the haploid nature by chromosome spreads. Finally, they used interaccession crosses between ecs1 ecs2 (Col-0) and Landsberg to verify that haploid progeny only carried maternal alleles of markers on all five chromosomes, indicating that the ecs1 ecs2 genotype can induce maternal haploids.

      This interesting study highlights the importance of following up on unexpected results. The conclusions are well-supported by the data and quite exciting. Paternal haploid inducers have been discovered in several species, but this is one of only two examples of maternal haploid induction. While the percentage of maternal haploids is very low, this phenomenon could be useful for plant breeding.

      Weaknesses

      The data in the manuscript is intriguing, but the question of how the same mutant combination promotes the formation of both triploid and haploid progeny remains unanswered and is not thoroughly discussed, nor is any model suggested for how the ECS1/2 peptidases could play a role in regulating gamete fusion and/or repressing parthenogenesis. A second unanswered question is whether the maternal haploids are a result of failed plasmogamy or karyogamy between the egg and sperm leading to parthenogenesis or a result of paternal genome elimination after plasmogamy. In figure 3B, the authors attempted to test whether plasmogamy occurs between the male and female gametes in ecs1 ecs2 ovules by crosses with pollen that expresses a mitochondrial marker under control of the pRPS5a promoter which is active in sperm cells as well as embryos and endosperm of fertilized ovules. This experiment allowed them to detect sperm cells that had not fused with the egg and central cell at 2 days after pollination. They also counted the percentage of seeds that expressed the mitochondrial marker in both embryo and endosperm at 2 DAP and found that ecs1 ecs2 mutants had a 20% reduction of visible mitochondria in embryo sacs compared to wildtype. They conclude that the result indicates a potential plasmogamy defect. However, the dependability of this marker is questionable since only ~55% of wild-type seeds had detectable signal in the embryo and endosperm. The authors imply that this experiment could be used to test plasmogamy, but it is not clear how any conclusions related to the abnormal seed phenotype could be drawn from examining the rate of signal in both the embryo and endosperm. Since the mitochondrial marker was not expressed from a sperm-specific promoter, the fluorescent signal at 2DAP is likely due to new gene expression from pRPS5a in the fertilized embryo and endosperm, not an indication of the presence of sperm-derived mitochondria. Perhaps an earlier timepoint could be used as well as a spermspecific promoter instead of pRPS5a to answer the question of whether plasmogamy is happening in the ecs1 ecs2 ovules.

      Thanks for the suggestion. We here provide two additional new data sets to provide evidence that ecs1 ecs2 mutant plants indeed exhibit single fertilization that lead to fertilization recovery.

      We determined the fertilization failure by checking the decondensation HTR10-RFP labelled sperm nuclei 8-10 HAP (Figure 3B) and the frequency of heterofertilization through dual pollination experiment (Figure 3C-E) (see above).

      Reviewer #2 (Public Review):

      The manuscript reports the triploid and haploid productions using an ecs1ecs2 mutant as the maternal donor, in addition to the evaluation of the sexual process observed in the mutant. The indicated data show exquisite quality. To improve the content, I recommend carefully reconsidering the descriptions because some of the insights would cause a stir in the controversy regarding ECS1&2 functions in plant reproduction.

      Strengths

      Triploid production by a combination of ecs1ecs2 mutant and HIPOD system has potential as a future plant breeding tool. Moreover, it's intriguing that both triploid and haploid productions were achieved using the same mutant as a maternal donor. I think authors can claim the value of their results more by adding descriptions about the usefulness of the aneuploid plants in plant breeding history.

      The evidence of the persistent synergid nucleus (Figure 3A) is critical insight reported by this study. As Maruyama et al. (2013) reported by live cell imaging, synergid-endosperm fusion had occurred at the two endosperm nuclei stage. It would be valuable to claim the observed fact by citing Maruyama's previous observation.

      Weakness

      As the authors suggested, the higher triploid frequency observed in ecs1ecs2 than WT was likely caused by the increased polyspermy. However, it also could be that reduction of normal seed number in ecs1ecs2 (whichever is due to failure of fertilization or embryo development arrest) accounts for the increased frequency of the triploid compared to WT.

      The results in Figure 3C-E suggested the single fertilization for both egg and central cells at similar frequencies. This is an exciting result, but it is still possible that the fertilized egg or central cell degenerated after fertilization resulting in the disappearance of paternally inherited fluorescence. Evaluation of fertilization patterns at 7-10HAP in ecs1ecs2 mutant may provide more confident insight, although unfused sperm cell was evaluated at 1DAP (Figure 3-figure supplement 1B). The fertilization states can be distinguished depending on the HTR10RFP sperm nuclei morphology and their positions, as reported by Takahashi et al (2018).

      Thank you for your suggestion. We added the requested experiment see Figure 3B in the revised manuscript. In addition, we conducted a dual pollination experiment, that provides evidence for the activation of the fertilization recovery machinery (Figure 3C-E) (see above).

      Several recent studies have reported exciting insights on ECS1&2 functions; however, various results from different laboratories have raised controversy. Though, the commonly found feature is the repression of polytubey. For readers, it would be helpful to organize the explanation about which insights are concordant or different.

      Thank you for your suggestion. We now indicate using terms like in line with or in contrast to, where our data confirms /or contradicts with previous data.

      In addition, a drawing that explains the time course in the process from pollination to seed development (up to 6DAP) based on WT would help to understand which point is evaluated in each data.

      Thank you for your suggestion. We added a model figure (Figure 4E) at the end of the manuscript that brings the concepts together and facilitates the understandings.

      Reviewer #3 (Public Review):

      In this manuscript, Mao et al. reported that the two proteases ECS1 and ECS2 participate in both polyspermy block and gamete fusion in Arabidopsis thaliana. The authors could observe polytubey phenotype which has been reported previously and obtain both triparental plants and haploids in ecs1 ecs2 mutants. Therefore, they proposed that the triparental plants resulted from the polytubey block defect, whereas the haploids were caused by the gamete fusion defect. Together with two other previous reports, I think it is very interesting to see these two proteases participating in so many different but connected processes. Although they did not provide the molecular mechanism of how ECS participated in polyspermy block and gamete fusion, their findings provide more options for and thus promote plant breeding. The work may have a wide application in the future and will be of broad interest to cell biologists working on gamete fusion and plant breeders.

      We thank the reviewer for their positive comments.

      Although most of the conclusions in this paper are well supported by the data, it could be improved with a minor revision including providing clearer data analysis and descriptions, images with higher resolution, and more discussions.

    1. Author Response

      Reviewer #2 (Public Review):

      Here, a simple model of cerebellar computation is used to study the dependence of task performance on input type: it is demonstrated that task performance and optimal representations are highly dependent on task and stimulus type. This challenges many standard models which use simple random stimuli and concludes that the granular layer is required to provide a sparse representation. This is a useful contribution to our understanding of cerebellar circuits, though, in common with many models of this type, the neural dynamics and circuit architecture are not very specific to the cerebellum, the model includes the feedforward structure and the high dimension of the granule layer, but little else. This paper has the virtue of including tasks that are more realistic, but by the paper’s own admission, the same model can be applied to the electrosensory lateral line lobe and it could, though it is not mentioned in the paper, be applied to the dentate gyrus and large pyramidal cells of CA3. The discussion does not include specific elements related to, for example, the dynamics of the Purkinje cells or the role of Golgi cells, and, in a way, the demonstration that the model can encompass different tasks and stimuli types is an indication of how abstract the model is. Nonetheless, it is useful and interesting to see a generalization of what has become a standard paradigm for discussing cerebellar function.

      We appreciate the Reviewer’s positive comments. Regarding the simplifications of our model, we agree that we have taken a modeling approach that abstracts away certain details to permit comparisons across systems. We now include an in-depth discussion of our simplifying assumptions (Assumptions & Extensions section in the Discussion) and have further noted the possibility that other biophysical mechanisms we have not accounted for may also underlie differences across systems.

      Our results predict that qualitative differences in the coding levels of cerebellum-like systems, across brain regions or across species, reflect an optimization to distinct tasks (Figure 7). However, it is also possible that differences in coding level arise from other physiological differences between systems.

      Reviewer #3 (Public Review):

      1) The paper by Xie et al is a modelling study of the mossy fiber-to-granule cell-to-Purkinje cell network, reporting that the optimal type of representations in the cerebellar granule cell layer depends on the type task. The paper stresses that the findings indicate a higher overall bias towards dense representations than stated in the literature, but it appears the authors have missed parts of the literature that already reported on this. While the modelling and analysis appear mathematically solid, the model is lacking many known constraints of the cerebellar circuitry, which makes the applicability of the findings to the biological counterpart somewhat limited.

      We thank the Reviewer for suggesting additional references to include in our manuscript, and for encouraging us to extend our model toward greater biological plausibility and more critically discuss simplifying assumptions we have made. We respond to both the comment about previous literature and about applicability to cerebellar circuitry in detail below.

      2) I have some concerns with the novelty of the main conclusion, here from the abstract: ’Here, we generalize theories of cerebellar learning to determine the optimal granule cell representation for tasks beyond random stimulus discrimination, including continuous input-output transformations as required for smooth motor control. We show that for such tasks, the optimal granule cell representation is substantially denser than predicted by classic theories.’ Stated like this, this has in principle already been shown, i.e. for example: Spanne and Jo¨rntell (2013) Processing of multi-dimensional sensorimotor information in the spinal and cerebellar neuronal circuitry: a new hypothesis. PLoS Comput Biol. 9(3):e1002979. Indeed, even the 2 DoF arm movement control that is used in the present paper as an application, was used in this previous paper, with similar conclusions with respect to the advantage of continuous input-output transformations and dense coding. Thus, already from the beginning of this paper, the novelty aspect of this paper is questionable. Even the conclusion in the last paragraph of the Introduction: ‘We show that, when learning input-output mappings for motor control tasks, the optimal granule cell representation is much denser than predicted by previous analyses.’ was in principle already shown by this previous paper.

      We thank the Reviewer for drawing our attention to Spanne and Jo¨rntell (2013). Our study shares certain similarities with this work, including the consideration of tasks with smooth input-output mappings, such as learning the dynamics of a two-joint arm. However, our study differs substantially, most notably the fact that we focus our study on parametrically varying the degree of sparsity in the granule cell layer to determine the circumstances under which dense versus sparse coding is optimal. To the best of our ability, we can find no result in Spanne and J¨orntell (2013) that indicates the performance of a network as a function of average coding level. Instead, Spanne and Jo¨rntell (2013) propose that inhibition from Golgi cells produces heterogeneity in coding level which can improve performance, which is an interesting but complementary finding to ours. We therefore do not believe that the quantitative computations of optimal coding level that we present are redundant with the results of this previous study. We also note that a key contribution of our study is mathemetical analysis of the inductive bias of networks with different coding levels which supports our conclusions.

      We have included a discussion of Spanne and Jo¨rntell (2013) and (2015) in the revised version of our manuscript:

      "Other studies have considered tasks with smooth input-output mappings and low-dimensional inputs, finding that heterogeneous Golgi cell inhibition can improve performance by diversifying individual granule cell thresholds (Spanne and J¨orntell, 2013). Extending our model to include heterogeneous thresholds is an interesting direction for future work. Another proposal states that dense coding may improve generalization (Spanne and Jo¨rntell, 2015). Our theory reveals that whether or not dense coding is beneficial depends on the task."

      3) However, the present paper does add several more specific investigations/characterizations that were not previously explored. Many of the main figures report interesting new model results. However, the model is implemented in a highly generic fashion. Consequently, the model relates better to general neural network theory than to specific interpretations of the function of the cerebellar neuronal circuitry. One good example is the findings reported in Figure 2. These represent an interesting extension to the main conclusion, but they are also partly based on arbitrariness as the type of mossy fiber input described in the random categorization task has not been observed in the mammalian cerebellum under behavior in vivo, whereas in contrast, the type of input for the motor control task does resemble mossy fiber input recorded under behavior (van Kan et al 1993).

      We agree that the tasks we consider in Figure 2 are simplified compared to those that we consider elsewhere in the paper. The choice of random mossy fiber input was made to provide a comparison to previous modeling studies that also use random input as a benchmark (Marr 1969, Albus 1971, Brunel 2004, Babadi and Sompolinsky 2014, Billings 2014, LitwinKumar et al., 2017). This baseline permits us to specifically evaluate the effects of lowdimensional inputs (Figure 2) and richer input-output mappings (Figure 2, Figure 7). We agree with the Reviewer that the random and uncorrelated mossy fiber activity that has been extensively used in previous studies is almost certainly an unrealistic idealization of in vivo neural activity—this is a motivating factor for our study, which relaxes this assumption and examines the consequences. To provide additional context, we have updated the following paragraph in the main text Results section:

      "A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space (Marr, 1969; Albus, 1971; Brunel et al., 2004; Babadi and Sompolinsky, 2014; Billings et al., 2014; Litwin-Kumar et al., 2017). While this may be a reasonable simplification in some cases, many tasks, including cerebellumdependent tasks, are likely best-described as being encoded by a low-dimensional set of variables. For example, the cerebellum is often hypothesized to learn a forward model for motor control (Wolpert et al., 1998), which uses sensory input and motor efference to predict an effector’s future state. Mossy fiber activity recorded in monkeys correlates with position and velocity during natural movement (van Kan et al., 1993). Sources of motor efference copies include motor cortex, whose population activity lies on a lowdimensional manifold (Wagner et al., 2019; Huang et al., 2013; Churchland et al., 2010; Yu et al., 2009). We begin by modeling the low dimensionality of inputs and later consider more specific tasks."

      4) The overall conclusion states: ‘Our results....suggest that optimal cerebellar representations are task-dependent.’ This is not a particularly strong or specific conclusion. One could interpret this statement as simply saying: ‘if I construct an arbitrary neural network, with arbitrary intrinsic properties in neurons and synapses, I can get outputs that depend on the intensity of the input that I provide to that network.’ Further, the last sentence of the Introduction states: ‘More broadly, we show that the sparsity of a neural code has a task-dependent influence on learning...’ This is very general and unspecific, and would likely not come as a surprise to anyone interested in the analysis of neural networks. It doesn’t pinpoint any specific biological problem but just says that if I change the density of the input to a [generic] network, then the learning will be impacted in one way or another.

      We agree with the Reviewer that our conclusions are quite general, and we have removed the final sentence as we agree it was unspecific. However, we disagree with the Reviewer’s paraphrasing of our results.

      First, we do not select arbitrary intrinsic properties of neurons and synapses. Rather, we construct a simplified model with a key quantity, the neuronal threshold, that we vary parametrically in order to assess the effect of the resulting changes in the representation on performance. Second, we do not vary the intensity/density of inputs provided to the network – this is fixed throughout our study for all key comparisons we perform. Instead, we vary the density (coding level) of the expansion layer representation and quantify its effect on inductive bias and generalization. Finally, our study’s key contribution is an explanation of the heterogeneity in average coding level observed across behaviors and cerebellum-like systems. We go beyond the empirical statement that there is a dependence of performance on the parameter that we vary by developing an analytical theory. Our theory describes the performance of the class of networks that we study and the properties of learning tasks that determine the optimal expansion layer representation.

      To clarify our main contributions, we have updated the final paragraph of the Introduction. We have also removed the sentence that the Reviewer objects to, as it was less specific than the other points we make here.

      "We propose that these differences can be explained by the capacity of representations with different levels of sparsity to support learning of different tasks. We show that the optimal level of sparsity depends on the structure of the input-output relationship of a task. When learning input-output mappings for motor control tasks, the optimal granule cell representation is much denser than predicted by previous analyses. To explain this result, we develop an analytic theory that predicts the performance of cerebellum-like circuits for arbitrary learning tasks. The theory describes how properties of cerebellar architecture and activity control these networks’ inductive bias: the tendency of a network toward learning particular types of input-output mappings (Sollich, 1998; Jacot et al., 2018; Bordelon et al., 2020; Canatar et al., 2021; Simon et al., 2021). The theory shows that inductive bias, rather than the dimension of the representation alone, is necessary to explain learning performance across tasks. It also suggests that cerebellar regions specialized for different functions may adjust the sparsity of their granule cell representations depending on the task."

      5) The interpretation of the distribution of the mossy fiber inputs to the granule cells, which would have a crucial impact on the results of a study like this, is likely incorrect. First, unlike the papers that the authors cite, there are many studies indicating that there is a topographic organization in the mossy fiber termination, such that mossy fibers from the same inputs, representing similar types of information, are regionally co-localized in the granule cell layer. Hence, there is no support for the model assumption that there is a predominantly random termination of mossy fibers of different origins. This risks invalidating the comparisons that the authors are making, i.e. such as in Figure 3. This is a list of example papers, there are more: van Kan, Gibson and Houk (1993) Movement-related inputs to intermediate cerebellum of the monkey. Journal of Neurophysiology. Garwicz et al (1998) Cutaneous receptive fields and topography of mossy fibres and climbing fibres projecting to cat cerebellar C3 zone. The Journal of Physiology. Brown and Bower (2001) Congruence of mossy fiber and climbing fiber tactile projections in the lateral hemispheres of the rat cerebellum. The Journal of Comparative Neurology. Na, Sugihara, Shinoda (2019) The entire trajectories of single pontocerebellar axons and their lobular and longitudinal terminal distribution patterns in multiple aldolase C-positive compartments of the rat cerebellar cortex. The Journal of Comparative Neurology.

      6) The nature of the mossy fiber-granule cell recording is also reviewed here: Gilbert and Miall (2022) How and Why the Cerebellum Recodes Input Signals: An Alternative to Machine Learning. The Neuroscientist. Further, considering the re-coding idea, the following paper shows that detailed information, as it is provided by mossy fibers, is transmitted through the granule cells without any evidence of re-coding: Jo¨rntell and Ekerot (2006) Journal of Neuroscience; and this paper shows that these granule inputs are powerfully transmitted to the molecular layer even in a decerebrated animal (i.e. where only the ascending sensory pathways remains) Jo¨rntell and Ekerot 2002, Neuron.

      We agree that there is strong evidence for a topographic organization in mossy fiber to granule cell connectivity at the microzonal level. We thank the Reviewer for pointing us to specific examples. We acknowledge that our simplified model does not capture the structure of connectivity observed in these studies.

      However, the focus of our model is on cerebellar neurons presynaptic to a single Purkinje cell. Random or disordered distribution of inputs at this local scale is compatible with topographic organization at the microzonal scale. Furthermore, while there is evidence of structured connections at the local scale, models with random connectivity are able to reproduce the dimensionality of granule cell activity within a small margin of error (Nguyen et al., 2022). Finally, our finding that dense codes are optimal for learning slowly varying tasks is consistent with evidence for the lack of re-coding – for such tasks, re-coding may absent because it is not required.

      We have dedicated a section on this issue in the Assumptions and Extensions portion of our Discussion:

      "Another key assumption concerning the granule cells is that they sample mossy fiber inputs randomly, as is typically assumed in Marr-Albus models (Marr, 1969; Albus, 1971; LitwinKumar et al., 2017; Cayco-Gajic et al., 2017). Other studies instead argue that granule cells sample from mossy fibers with highly similar receptive fields (Garwicz et al., 1998; Brown and Bower, 2001; J¨orntell and Ekerot, 2006) defined by the tuning of mossy fiber and climbing fiber inputs to cerebellar microzones (Apps et al., 2018). This has led to an alternative hypothesis that granule cells serve to relay similarly tuned mossy fiber inputs and enhance their signal-to-noise ratio (Jo¨rntell and Ekerot, 2006; Gilbert and Chris Miall, 2022) rather than to re-encode inputs. Another hypothesis is that granule cells enable Purkinje cells to learn piece-wise linear approximations of nonlinear functions (Spanne and J¨orntell, 2013). However, several recent studies support the existence of heterogeneous connectivity and selectivity of granule cells to multiple distinct inputs at the local scale (Huang et al., 2013; Ishikawa et al., 2015). Furthermore, the deviation of the predicted dimension in models constrained by electron-microscopy data as compared to randomly wired models is modest (Nguyen et al., 2022). Thus, topographically organized connectivity at the macroscopic scale may coexist with disordered connectivity at the local scale, allowing granule cells presynaptic to an individual Purkinje cell to sample heterogeneous combinations of the subset of sensorimotor signals relevant to the tasks that Purkinje cell participates in. Finally, we note that the optimality of dense codes for learning slowly varying tasks in our theory suggests that observations of a lack of mixing (J¨orntell and Ekerot, 2002) for such tasks are compatible with Marr-Albus models, as in this case nonlinear mixing is not required."

      7) I could not find any description of the neuron model used in this paper, so I assume that the neurons are just modelled as linear summators with a threshold (in fact, Figure 5 mentions inhibition, but this appears to be just one big lump inhibition, which basically is an incorrect implementation). In reality, granule cells of course do have specific properties that can impact the input-output transformation, PARTICULARLY with respect to the comparison of sparse versus dense coding, because the low-pass filtering of input that occurs in granule cells (and other neurons) as well as their spike firing stochasticity (Saarinen et al (2008). Stochastic differential equation model for cerebellar granule cell excitability. PLoS Comput. Biol. 4:e1000004) will profoundly complicate these comparisons and make them less straight forward than what is portrayed in this paper. There are also several other factors that would be present in the biological setting but are lacking here, which makes it doubtful how much information in relation to the biological performance that this modelling study provides: What are the types of activity patterns of the inputs? What are the learning rules? What is the topography? What is the impact of Purkinje cell outputs downstream, as the Purkinje cell output does not have any direct action, it acts on the deep cerebellar nuclear neurons, which in turn act on a complex sensorimotor circuitry to exert their effect, hence predictive coding could only become interpretable after the PC output has been added to the activity in those circuits. Where is the differentiated Golgi cell inhibition?

      Thank you for these critiques. We have made numerous edits to improve the presentation of the details of our model in the main text of the manuscript. Indeed, granule cells in the main text are modeled as linear sums of mossy fiber inputs with a threshold-linear activation function. A more detailed description of the model for granule cells can now be found in Equation 1 in the Results section:

      "The activity of neurons in the expansion layer is given by: h = φ(Jeffx − θ), (1) where φ is a rectified linear activation function φ(u) = max(u,0) applied element-wise. Our results also hold for other threshold-polynomial activation functions. The scalar threshold θ is shared across neurons and controls the coding level, which we denote by f, defined as the average fraction of neurons in the expansion layer that are active."

      Most of our analyses use the firing rate model we describe above, but several Supplemental Figures show extensions to this model. As we mention in the Discussion, our results do not depend on the specific choice of nonlinearity (Figure 2-figure supplement 2). We have also considered the possibility that the stochastic nature of granule cell spikes could impact our measures of coding level. In Figure 7-figure supplement 1 we test the robustness of our main conclusion using a spiking model where we model granule cell spikes with Poisson statistics. When measuring coding level in a population of spiking neurons, a key question is at what time window the Purkinje cell integrates spikes. For several choices of integration time windows, we show that dense coding remains optimal for learning smooth tasks. However, we agree with the Reviewer that there are other biological details our model does not address. For example, our spiking model does not capture some of the properties the Saarinen et al. (2008) model captures, including random sub-threshold oscillations and clusters of spikes. Modeling biophysical phenomena at this scale is beyond the scope of our study. We have added this reference to the relevant section of the Discussion:

      "We also note that coding level is most easily defined when neurons are modeled as rate, rather than spiking units. To investigate the consistency of our results under a spiking code, we implemented a model in which granule cell spiking exhibits Poisson variability and quantify coding level as the fraction of neurons that have nonzero spike counts (Figure 7-figure supplement 1; Figure 7C). In general, increased spike count leads to improved performance as noise associated with spiking variability is reduced. Granule cells have been shown to exhibit reliable burst responses to mossy fiber stimulation (Chadderton et al., 2004), motivating models using deterministic responses or sub-Poisson spiking variability. However, further work is needed to quantitatively compare variability in model and experiment and to account for more complex biophysical properties of granule cells (Saarinen et al., 2008)."

      A second concern the Reviewer raises is our implementation of Golgi cell inhibition as a homogeneous rather than heterogeneous input onto granule cells. In simplified models, adding heterogeneous inhibition does not dramatically change the qualitative properties of the expansion layer representation, in particular the dimensionality of the representation (Billings et al., 2014, Cayco-Gajic et al., 2017, Litwin-Kumar et al., 2017). We have added a section about inhibition to our Discussion:

      "We also have not explicitly modeled inhibitory input provided by Golgi cells, instead assuming such input can be modeled as a change in effective threshold, as in previous studies (Billings et al., 2014; Cayco-Gajic et al., 2017; Litwin-Kumar et al., 2017). This is appropriate when considering the dimension of the granule cell representation (Litwin-Kumar et al., 2017), but more work is needed to extend our model to the case of heterogeneous inhibition."

      Regarding the mossy fiber inputs, as we state in response to paragraph 3, we agree with the Reviewer that the random and uncorrelated mossy fiber activity that has been used in previous studies is an unrealistic idealization of in vivo neural activity. One of the motivations for our model was to relax this assumption and examine the consequences: we introduce correlations in the mossy fiber activity by projecting low-dimensional patterns into the mossy fiber layer (Figure 1B):

      "A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space (Marr, 1969; Albus, 1971; Brunel et al., 2004; Babadi and Sompolinsky, 2014; Billings et al., 2014; Litwin-Kumar et al., 2017). While this may be a reasonable simplification in some cases, many tasks, including cerebellumdependent tasks, are likely best-described as being encoded by a low-dimensional set of variables. For example, the cerebellum is often hypothesized to learn a forward model for motor control (Wolpert et al., 1998), which uses sensory input and motor efference to predict an effector’s future state. Mossy fiber activity recorded in monkeys correlates with position and velocity during natural movement (van Kan et al., 1993). Sources of motor efference copies include motor cortex, whose population activity lies on a low-dimensional manifold (Wagner et al., 2019; Huang et al., 2013; Churchland et al., 2010; Yu et al., 2009). We begin by modeling the low dimensionality of inputs and later consider more specific tasks.

      We therefore assume that the inputs to our model lie on a D-dimensional subspace embedded in the N-dimensional input space, where D is typically much smaller than N (Figure 1B). We refer to this subspace as the “task subspace” (Figure 1C)."

      The Reviewer also mentions the learning rule at granule cell to Purkinje cell synapses. We agree that considering online, climbing-fiber-dependent learning is an important generalization. We therefore added a new supplemental figure investigating whether we would still see a difference in optimal coding levels across tasks if online learning were used instead of the least squares solution (Figure 7-figure supplement 2). Indeed, we observed a similar task dependence as we saw in Figure 2F. We have added a new paragraph in the Discussion under Assumptions and Extensions describing our rationale and approach in detail:

      "For the Purkinje cells, our model assumes that their responses to granule cell input can be modeled as an optimal linear readout. Our model therefore provides an upper bound to linear readout performance, a standard benchmark for the quality of a neural representation that does not require assumptions on the nature of climbing fiber-mediated plasticity, which is still debated. Electrophysiological studies have argued in favor of a linear approximation (Brunel et al., 2004). To improve the biological applicability of our model, we implemented an online climbing fiber-mediated learning rule and found that optimal coding levels are still task-dependent (Figure 7-figure supplement 2). We also note that although we model several timing-dependent tasks (Figure 7), our learning rule does not exploit temporal information, and we assume that temporal dynamics of granule cell responses are largely inherited from mossy fibers. Integrating temporal information into our model is an interesting direction for future investigation."

      Finally, regarding the function of the Purkinje cell, our model defines a learning task as a mapping from inputs to target activity in the Purkinje cell and is thus agnostic to the cell’s downstream effects. We clarify this point when introducing the definition of a learning task:

      "In our model, a learning task is defined by a mapping from task variables x to an output f(x), representing a target change in activity of a readout neuron, for example a Purkinje cell. The limited scope of this definition implies our results should not strongly depend on the influence of the readout neuron on downstream circuits."

      8) The problem of these, in my impression, generic, arbitrary settings of the neurons and the network in the model becomes obvious here: ‘In contrast to the dense activity in cerebellar granule cells, odor responses in Kenyon cells, the analogs of granule cells in the Drosophila mushroom body, are sparse...’ How can this system be interpreted as an analogy to granule cells in the mammalian cerebellum when the model does not address the specifics lined up above? I.e. the ‘inductive bias’ that the authors speak of, defined as ‘the tendency of a network toward learning particular types of input-output mappings’, would be highly dependent on the specifics of the network model.

      We agree with the Reviewer that our model makes several simplifying assumptions for mathematical tractability. However, we note that our study is not the first to draw analogies between cerebellum-like systems, including the mushroom body (Bell et al., 2008; Farris, 2011). All the systems we study feature a sparsely connected, expanded granule-like layer that sends parallel fiber axons onto densely connected downstream neurons known to exhibit powerful synaptic plasticity, thus motivating the key architectural assumptions of our model. We have constrained anatomical parameters of the model using data as available (Table 1). However, we agree with the Reviewer that when making comparisons across species there is always a possibility that differences are due to physiological mechanisms we have not fully understood or captured with a model. As such, we can only present a hypothesis for these differences. We have modified our Discussion section on this topic to clearly state this.

      "Our results predict that qualitative differences in the coding levels of cerebellum-like systems, across brain regions or across species, reflect an optimization to distinct tasks (Figure 7). However, it is also possible that differences in coding level arise from other physiological differences between systems."

      9) More detailed comments: Abstract: ‘In these models [Marr-Albus], granule cells form a sparse, combinatorial encoding of diverse sensorimotor inputs. Such sparse representations are optimal for learning to discriminate random stimuli.’ Yes, I would agree with the first part, but I contest the second part of this statement. I think what is true for sparse coding is that the learning of random stimuli will be faster, as in a perceptron, but not necessarily better. As the sparsification essentially removes information, it could be argued that the quality of the learning is poorer. So from that perspective, it is not optimal. The authors need to specify from what perspective they consider sparse representations optimal for learning.

      This is an important point that we would like to clarify. It is not the case that sparse coding simply speeds up learning. In our study and many related works (Barak et al. 2013; Babadi and Sompolinsky 2014; Litwin-Kumar et al. 2017), learning performance is measured based on the generalization ability of the network – the ability to predict correct labels for previously unseen inputs. As our study and previous studies show, sparse codes are optimal in the sense that they minimize generalization error, independent of any effect on learning speed. To communicate this more effectively, we have added the following sentence to the first paragraph of the Introduction:

      "Sparsity affects both learning speed (Cayco-Gajic et al., 2017), and generalization, the ability to predict correct labels for previously unseen inputs (Barak et al., 2013; Babadi and Sompolinsky, 2014; Litwin-Kumar et al., 2017)."

      10) Introduction: ‘Indeed, several recent studies have reported dense activity in cerebellar granule cells in response to sensory stimulation or during motor control tasks (Knogler et al., 2017; Wagner et al., 2017; Giovannucci et al., 2017; Badura and De Zeeuw, 2017; Wagner et al., 2019), at odds with classic theories (Marr, 1969; Albus, 1971).’ In fact, this was precisely the issue that was addressed already by Jo¨rntell and Ekerot (2006) Journal of Neuroscience. The conclusion was that these actual recordings of granule cells in vivo provided essentially no support for the assumptions in the Marr-Albus theories.

      In our reading, the main finding of J¨orntell and Ekerot (2006) is that individual granule cells are activated by mossy fibers with overlapping receptive fields driven by a single type of somatosensory input. However, there is also evidence of nonlinear mixed selectivity in granule cells in support of the re-coding hypothesis (Huang et al., 2013; Ishikawa et al., 2015). Jo¨rntell and Ekerot (2006) also suggest that the granule cell layer shares similar topographic organization as mossy fibers, organized into microzones. The existence of topographic organization does not invalidate Marr-Albus theories. As we have suggested earlier, a local combinatorial expansion can coexist with a global topographic organization.

      We have described these considerations in the Assumptions and Extensions portion of the Discussion:

      "Another key assumption concerning the granule cells is that they sample mossy fiber inputs randomly, as is typically assumed in Marr-Albus models (Marr, 1969; Albus, 1971; LitwinKumar et al., 2017; Cayco-Gajic et al., 2017). Other studies instead argue that granule cells sample from mossy fibers with highly similar receptive fields (Garwicz et al., 1998; Brown and Bower, 2001; J¨orntell and Ekerot, 2006) defined by the tuning of mossy fiber and climbing fiber inputs to cerebellar microzones (Apps et al., 2018). This has led to an alternative hypothesis that granule cells serve to relay similarly tuned mossy fiber inputs and enhance their signal-to-noise ratio (Jo¨rntell and Ekerot, 2006; Gilbert and Chris Miall, 2022) rather than to re-encode inputs. Another hypothesis is that granule cells enable Purkinje cells to learn piece-wise linear approximations of nonlinear functions (Spanne and J¨orntell, 2013). However, several recent studies support the existence of heterogeneous connectivity and selectivity of granule cells to multiple distinct inputs at the local scale (Huang et al., 2013; Ishikawa et al., 2015). Furthermore, the deviation of the predicted dimension in models constrained by electron-microscopy data as compared to randomly wired models is modest (Nguyen et al., 2022). Thus, topographically organized connectivity at the macroscopic scale may coexist with disordered connectivity at the local scale, allowing granule cells presynaptic to an individual Purkinje cell to sample heterogeneous combinations of the subset of sensorimotor signals relevant to the tasks that Purkinje cell participates in. Finally, we note that the optimality of dense codes for learning slowly varying tasks in our theory suggests that observations of a lack of mixing (J¨orntell and Ekerot, 2002) for such tasks are compatible with Marr-Albus models, as in this case nonlinear mixing is not required."

      We have also included the Jo¨rntell and Ekerot (2006) study as a citation in the Introduction:

      "Indeed, several recent studies have reported dense activity in cerebellar granule cells in response to sensory stimulation or during motor control tasks (Jo¨rntell and Ekerot, 2006; Knogler et al., 2017; Wagner et al., 2017; Giovannucci et al., 2017; Badura and De Zeeuw, 2017; Wagner et al., 2019), at odds with classic theories (Marr, 1969; Albus, 1971)."

      11) Results: 1st para: There is no information about how the granule cells are modelled.

      We agree that this should information should have been more readily available. We now more completely describe the model in the main text. Our model for granule cells can be found in Equation 1 in the Results section and also the Methods (Network Model):

      "The activity of neurons in the expansion layer is given by: h = φ(Jeffx − θ), (2)

      where φ is a rectified linear activation function φ(u) = max(u,0) applied element-wise. Our results also hold for other threshold-polynomial activation functions. The scalar threshold θ is shared across neurons and controls the coding level, which we denote by f, defined as the average fraction of neurons in the expansion layer that are active."

      12) 2nd para: ‘A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space.’ Yes, I agree, and this is in fact in conflict with the known topographical organization in the cerebellar cortex (see broader comment above). Mossy fiber inputs coding for closely related inputs are co-localized in the cerebellar cortex. I think for this model to be of interest from the point of view of the mammalian cerebellar cortex, it would need to pay more attention to this organizational feature.

      As we discuss in our response to paragraphs 5 and 6, we see the random distribution assumption at the local scale (inputs presynaptic to a single Purkinje cell) as being compatible with topographic organization occurring at the microzone scale. Furthermore, as discussed earlier, we specifically model low-dimensional input as opposed to the random and high-dimensional inputs typically studied in prior models.

      "A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space (Marr, 1969; Albus, 1971; Brunel et al., 2004; Babadi and Sompolinsky, 2014; Billings et al., 2014; Litwin-Kumar et al., 2017). While this may be a reasonable simplification in some cases, many tasks, including cerebellumdependent tasks, are likely best-described as being encoded by a low-dimensional set of variables. For example, the cerebellum is often hypothesized to learn a forward model for motor control (Wolpert et al., 1998), which uses sensory input and motor efference to predict an effector’s future state. Mossy fiber activity recorded in monkeys correlates with position and velocity during natural movement (van Kan et al., 1993). Sources of motor efference copies include motor cortex, whose population activity lies on a low-dimensional manifold (Wagner et al., 2019; Huang et al., 2013; Churchland et al., 2010; Yu et al., 2009). We begin by modeling the low dimensionality of inputs and later consider more specific tasks. We therefore assume that the inputs to our model lie on a D-dimensional subspace embedded in the N-dimensional input space, where D is typically much smaller than N (Figure 1B). We refer to this subspace as the “task subspace” (Figure 1C)."

      References

      Albus, J.S. (1971). A theory of cerebellar function. Mathematical Biosciences 10, 25–61.

      Apps, R., et al. (2018). Cerebellar Modules and Their Role as Operational Cerebellar Processing Units. Cerebellum 17, 654–682.

      Babadi, B. and Sompolinsky, H. (2014). Sparseness and expansion in sensory representations. Neuron 83, 1213–1226.

      Badura, A. and De Zeeuw, C.I. (2017). Cerebellar granule cells: dense, rich and evolving representations. Current Biology 27, R415–R418.

      Barak, O., Rigotti, M., and Fusi, S. (2013). The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. Journal of Neuroscience 33, 3844– 3856.

      Bell, C.C., Han, V., and Sawtell, N.B. (2008). Cerebellum-like structures and their implications for cerebellar function. Annual Review of Neuroscience 31, 1–24.

      Billings, G., Piasini, E., Lo˝rincz, A., Nusser, Z., and Silver, R.A. (2014). Network structure within the cerebellar input layer enables lossless sparse encoding. Neuron 83, 960–974.

      Bordelon, B., Canatar, A., and Pehlevan, C. (2020). Spectrum dependent learning curves in kernel regression and wide neural networks. International Conference on Machine Learning 1024–1034.

      Brown, I.E. and Bower, J.M. (2001). Congruence of mossy fiber and climbing fiber tactile projections in the lateral hemispheres of the rat cerebellum. Journal of Comparative Neurology 429, 59–70.

      Brunel, N., Hakim, V., Isope, P., Nadal, J.P., and Barbour, B. (2004). Optimal information storage and the distribution of synaptic weights: perceptron versus Purkinje cell. Neuron 43, 745–757.

      Canatar, A., Bordelon, B., and Pehlevan, C. (2021). Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks. Nature Communications 12, 1–12.

      Cayco-Gajic, N.A., Clopath, C., and Silver, R.A. (2017). Sparse synaptic connectivity is required for decorrelation and pattern separation in feedforward networks. Nature Communications 8, 1–11.

      Chadderton, P., Margrie, T.W., and Ha¨usser, M. (2004). Integration of quanta in cerebellar granule cells during sensory processing. Nature 428, 856–860.

      Churchland, M.M., et al. (2010). Stimulus onset quenches neural variability: a widespread cortical phenomenon. Nature Neuroscience 13, 369–378.

      Farris, S.M. (2011). Are mushroom bodies cerebellum-like structures? Arthropod structure & development 40, 368–379.

      Garwicz, M., Jorntell, H., and Ekerot, C.F. (1998). Cutaneous receptive fields and topography of mossy fibres and climbing fibres projecting to cat cerebellar C3 zone. The Journal of Physiology 512 ( Pt 1), 277–293.

      Gilbert, M. and Chris Miall, R. (2022). How and Why the Cerebellum Recodes Input Signals: An Alternative to Machine Learning. The Neuroscientist 28, 206–221.

      Giovannucci, A., et al. (2017). Cerebellar granule cells acquire a widespread predictive feedback signal during motor learning. Nature Neuroscience 20, 727–734.

      Huang, C.C., et al. (2013). Convergence of pontine and proprioceptive streams onto multimodal cerebellar granule cells. eLife 2, e00400.

      Ishikawa, T., Shimuta, M., and Ha¨usser, M. (2015). Multimodal sensory integration in single cerebellar granule cells in vivo. eLife 4, e12916.

      Jacot, A., Gabriel, F., and Hongler, C. (2018). Neural tangent kernel: Convergence and generalization in neural networks. Advances in Neural Information Processing Systems 31.

      Jo¨rntell, H. and Ekerot, C.F. (2002). Reciprocal Bidirectional Plasticity of Parallel Fiber Receptive Fields in Cerebellar Purkinje Cells and Their Afferent Interneurons. Neuron 34, 797–806.

      Jorntell, H. and Ekerot, C.F. (2006). Properties of Somatosensory Synaptic Integration in Cerebellar Granule Cells In Vivo. Journal of Neuroscience 26, 11786–11797.

      Knogler, L.D., Markov, D.A., Dragomir, E.I., Stih, V., and Portugues, R. (2017). Senso-ˇ rimotor representations in cerebellar granule cells in larval zebrafish are dense, spatially organized, and non-temporally patterned. Current Biology 27, 1288–1302.

      Litwin-Kumar, A., Harris, K.D., Axel, R., Sompolinsky, H., and Abbott, L.F. (2017). Optimal degrees of synaptic connectivity. Neuron 93, 1153–1164. Marr, D. (1969). A theory of cerebellar cortex. Journal of Physiology 202, 437–470.

      Nguyen, T.M., et al. (2022). Structured cerebellar connectivity supports resilient pattern separation. Nature 1–7.

      Saarinen, A., Linne, M.L., and Yli-Harja, O. (2008). Stochastic Differential Equation Model for Cerebellar Granule Cell Excitability. PLOS Computational Biology 4, e1000004.

      Simon, J.B., Dickens, M., and DeWeese, M.R. (2021). A theory of the inductive bias and generalization of kernel regression and wide neural networks. arXiv: 2110.03922.

      Sollich, P. (1998). Learning curves for Gaussian processes. Advances in Neural Information Processing Systems 11.

      Spanne, A. and Jo¨rntell, H. (2013). Processing of Multi-dimensional Sensorimotor Information in the Spinal and Cerebellar Neuronal Circuitry: A New Hypothesis. PLOS Computational Biology 9, e1002979.

      Spanne, A. and Jo¨rntell, H. (2015). Questioning the role of sparse coding in the brain. Trends in Neurosciences 38, 417–427.

      van Kan, P.L., Gibson, A.R., and Houk, J.C. (1993). Movement-related inputs to intermediate cerebellum of the monkey. Journal of Neurophysiology 69, 74–94.

      Wagner, M.J., Kim, T.H., Savall, J., Schnitzer, M.J., and Luo, L. (2017). Cerebellar granule cells encode the expectation of reward. Nature 544, 96–100.

      Wagner, M.J., et al. (2019). Shared cortex-cerebellum dynamics in the execution and learning of a motor task. Cell 177, 669–682.e24.

      Wolpert, D.M., Miall, R.C., and Kawato, M. (1998). Internal models in the cerebellum. Trends in Cognitive Sciences 2, 338–347.

      Yu, B.M., et al. (2009). Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. Journal of Neurophysiology 102, 614–635.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Huang et al., assess cognitive flexibility in rats trained on an animal model of anorexia nervosa known as activity-based anorexia (ABA). For the first time, they do this in a way that is fully automated and free from experimenter interference, as apparently experimenter interference can affect both the development of ABA as well as the effect on behaviour. They show that animals that are more cognitively flexible (i.e. animals that had received reversal training) were better able to resist weight loss upon exposure to ABA, whereas animals exposed to ABA first show poorer cognitive flexibility (reversal performance).

      Strengths:

      • The development of a fully-automated, experimenter-free behavioural assessment paradigm that is capable of identifying individual rats and therefore tracking their performance.

      • The bidirectional nature of the study - i.e. the fact that animals were tested for cognitive flexibility both before and after exposure to ABA, so that direction of causality could be established.

      • The analyses are rigorous and the sample sizes sufficient.

      • The use of touchscreens increases the translational potential of the findings.

      Weaknesses

      • Some descriptions of methods and results are confusing or insufficiently detailed.

      We have been through all methods and results to include additional details as requested by this reviewer below.

      It seems to me that performance on the pairwise discrimination task cannot be directly (statistically) compared to performance on reversal (as in Figure 4E), as these are tapping into fundamentally different cognitive processes (discrimination versus reversal learning). I think comparing groups on each assessment is valid, however.

      We agree that discrimination and reversal are different cognitive processes, and statistical comparisons between these two components of the task were only made when examining the speed of learning in the validation of the novel testing system. Moreover, our inclusion of the pink and purple bars on graphs such as Figure 4C & 4E represent “main effects of ABA exposure”, regardless of learning phase (PD or reversal) rather than, as you describe, comparing PD to R1. Perhaps this comparison wasn’t clear, so we have amended the text to say ‘main effect of ABA exposure p=.0017’ rather than just “exposure”.

      Not necessarily a 'weakness' but I would have loved to see some assessment of the alterations in neural mechanisms underlying these effects, and/or some different behavioural assessments in addition to those used here. In particular, the authors mention in the discussion that this manipulation can affect cholinergic functioning in the dorsal striatum We (Bradfield et al., Neuron, 2013) and a number of others have now demonstrated that cholinergic dysfunction in the dorsomedial striatum impairs a different kind of reversal learning that based on alterations in outcome identity and thus relies on a different cognitive process (i.e. 'state' rather than 'reward' prediction error). It would be interesting perhaps in the future to see if the ABA manipulation also alters performance on this alternative 'cognitive flexibility' task.

      This is an excellent suggestion and we have already begun exploring this in other ongoing work in the laboratory. Due to ‘compulsive’ wheel running being a hallmark of ABA, we are interested in determining if this also translates to a goal-directed action impairment using the well-established outcome-specific devaluation task. Perhaps with ABA it may be more relevant to investigate outcome-reversals rather than stimulus-reversals, and if this is the case, it would further support the use of the ABA model for investigating cognitive dysfunction relevant to AN. We have included an additional section in the discussion text relating to our hypotheses regarding outcome-specific reversal learning in the ABA model.

      Nevertheless, I certainly think the manuscript provides a solid appraisal of cognitive flexibility using more traditional tasks, and that the authors have achieved their aims. I think the work here will be of importance, certainly to other researchers using the ABA model, but perhaps also of translational importance in the future, as the causal relationship between ABA and cognitive inflexibility is near impossible to establish using human studies, but here evidence points strongly towards this being the case.

      Reviewer #2 (Public Review):

      Huang and colleagues present data from experiments assessing the role of cognitive inflexibility in the vulnerability to weight loss in the activity-based anorexia paradigm in rats. The experiments employ a novel in-home cage touchscreen system. The home cage touch screen system allows reduced testing time and increased throughput compared with the more widely used systems resulting in the ability to assess ABA following testing cognitive flexibility in relatively young female rats. The data demonstrate that, contrary to expectations, cognitive inflexibility does not predispose to greater ABA weight loss, but instead, rats that performed better in the reversal learning task lost more weight in the ABA paradigm. Prior ABA exposure resulted in poorer learning of the task and reversal. An additional experiment demonstrated that rats that had been trained in reversal learning resisted weight loss in the ABA paradigm. The findings are important and are clearly presented. They have implications for anorexia nervosa both in terms of potentially identifying those at risk also in understanding the high rates of relapse.

      Thanks for a great summary of the manuscript.

      Reviewer #3 (Public Review):

      Activity-based anorexia (ABA), which combines access to a running wheel and restricted access to food, is a most common paradigm used to study anorexic behavior in rodents. And yet, the field has been plagued by persistent questions about its validity as a model of anorexia nervosa (AN) in humans. This group's previous studies supported the idea that the ABA paradigm captures cognitive inflexibility seen in AN. Here they describe a fully automated touchscreen cognitive testing system for rats that makes it possible to ask whether cognitive inflexibility predisposes individuals to severe weight loss in the ABA paradigm. They observed that cognitive inflexibility was predictive of resistance to weight loss in the ABA, the opposite of what was predicted. They also reported reciprocal effects of ABA and cognitive testing on subsequent performance in the other paradigm. Prior exposure to the ABA decreased subsequent cognitive performance, while prior exposure to the cognitive task promoted resistance to the ABA. Based on these findings, the authors argue that the ABA model can be used to identify novel therapeutic targets for AN.

      The strength of this manuscript is primarily as a methods paper describing a novel automated cognitive behavioral testing system that obviates the need for experimentalist handling and single housing, which can interfere with behavioral testing, and accelerate learning on the task. Together, these features make it feasible to perform longitudinal studies to ask whether cognitive performance is predictive of behavior in a second paradigm during adolescence, a peak period of vulnerability for many psychiatric disorders. The authors also used machine learning tools to identify specific behaviors during the cognitive task that predicted later susceptibility to the ABA paradigm. While the benefits of this system are clear, the rigor and reproducibility of experiments using this paradigm would be enhanced if the authors provided clear guidelines about which parameters and analyses are most useful. In their absence, the large amount of data generated can promote p-hacking.

      The authors use their automated behavioral testing paradigm to ask whether cognitive inflexibility is a cause or consequence of susceptibility to ABA, an issue that cannot be addressed in AN. They provide compelling evidence that there are reciprocal effects of the two behavioral paradigms, but do not perform the controls needed to evaluate the significance of these observations. For example, the learning task involves sucrose consumption and food restriction, conditions that can independently affect susceptibility to the ABA. Similarly, the ABA paradigm involves exercise and restricted access to food, which can both affect learning.

      In the Discussion, the authors hypothesize that the ABA paradigm produces cognitive inflexibility and argue that uncovering the underlying mechanism can be used to identify new therapeutic targets for AN. The rationale for their claim of translational relevance is undermined by the fact that the biggest effect of the ABA paradigm is seen in the pair discrimination task, and not reversal learning. This pattern does not fit clinical observations in AN.

      In summary, the significance of this manuscript lies in the development of a new system to test cognitive function in rats that can be combined with other paradigms to explore questions of causality. While the authors clearly demonstrate that cognitive flexibility does not promote susceptibility to ABA, the experiments presented do not provide a compelling case that their model captures important features of the pathophysiology of AN.

      We thank the reviewer for this detailed review and note that we have now both explicitly defined the most useful parameters for analyses from the novel touchscreen system as well as removed some comparisons that could be considered superfluous. We argue that the additional information provided by the machine learning analyses are, at this stage, exploratory, and rather than reveal independent descriptions of behavioural change in ABA exposed versus naïve rats this information will aid in the generation of hypotheses to be tested in future studies. Therefore, the figures pertaining to these analyses have now been provided as supplements to Figures 3 & 4 (Figure 3-figure supplement 3; Figure 4-figure supplements 3&4). We have also clarified our intention to explore possible behavioural differences using this technique in the methods and discussion.

      We have also completed the essential control experiment, defined in the “essential revisions” section of this review, whereby we show only moderate impairments in reversal learning following a matched period of food restriction without rapid weight loss, suggesting that the substantial impairment seen following ABA exposure was not due to food restriction alone (see updated Figure 4 and supplements).

      However, we do not agree with this reviewer “that the biggest effect of the ABA paradigm is seen in the pair discrimination task” and point to the outcomes of both reciprocal experiments.

      In the first experiment, rats that went onto be susceptible or resistant to ABA did not differ on pairwise discrimination learning but specifically on performance at the reversal of reward contingencies (Figure 3B & E). Although this result was not in the hypothesised direction, this suggests that reversal learning specifically and not pairwise discrimination can differentiate those rats that go on to be susceptible to weight loss. We have included additional discussion in the text related to this finding (see line 490-497).

      In the second experiment, it is clear by the number of ABA exposed rats that were unable to learn the reversal component even after being able to learn pairwise discrimination, that flexible learning is more impaired by ABA. While it is true that ABA exposed rats that were successful in learning the reversal task were slower to learn the pairwise discrimination component than naïve rats (Figure 4E), this was not related to their ability to learn the reversal task overall – with equivalent learning rates in pairwise discrimination to ABA exposed rats that failed to learn the reversal component (Figure 4G-I). The absence of significant differences between ABA exposed and naïve animals in Figure 4F relates to the fact that the large proportion of ABA exposed animals never reached performance criterion in the reversal phase of the task and therefore data from these animals could not be included in the figure. This is where the trials completed within each session becomes important for interpretation (i.e. Figure 4-figure supplement 1M-O), whereby ABA exposure caused impaired responding specifically within the reversal phase of the task. The results text has been updated to better reflect this critical point.

      Overall, this suggests that the impairment in cognitive flexibility caused by ABA exposure was related both to an associative learning impairment (slower to learn PD than naïve animals) and an impairment in the integration of new and existing learning (failure to learn R1 in a large proportion of animals).

    1. one must conclude that community is always in/with time, always unfinished,

      Pauline van Mourik Broekman: And community is also always in/with space. In that respect, it seems so important to recognise how hard editors of ‘living books’ actually find it to encourage the reuse/appropriation/disappropriation offered up, and quite how much (material, socialised) time and care it takes to coax – and perform – this activity sensitively, on- and offline, with all the nuances you’ve described (and which run counter to the ‘social’ as the metricised communicating human being is now supposed to perform – and seek – it, and whose conditions of ‘communication’ Jodi Dean has done a lot to theorise).

      My PhD research on early Soviet life made me realise it is just really hard to conceive of the experience of true convulsive collectivity (a loss of individuality that I realise may be different, but that I hope might also be compared to the forms of subjectivation inherent in disappropriation?). And how creativity, let alone ‘authorship’, might be experienced within that. Do we (and I am thinking here especially of scholarly workers) come anything close to Walter Benjamin’s experience, from 1927, of how “Each thought, each day, each life lies here [in Moscow] as on a laboratory table. … No organism, no organisation, can escape this process.” Sensations which are also documented in Richard Stites’ Revolutionary Dreams: Utopian Vision and Experimental Life in the Russian Revolution, Oxford: Oxford University Press, 1989; and similarly, in Kristin Ross’s works on the Paris commune (Ross, 2008, 2015). The Soviet concept of the ‘social condenser’ is fascinating in this respect in that it places architecture, and space/s, right in the centre of psychosocial subjectivation, as a potentially intensifying, opening or collectivising force in social movement and change (as some have commented, these might importantly be separated into ‘planned’ and ‘accidental’ social condensers, meaning those which are forward-looking and intentional, or retroactively recognised for their capacities).

      If, as Teju Cole so memorably described, we have achieved the sort of collective spectacular alienation wherein we can witness ‘death in the browser tab’ while sitting still in front of a computer and toggling between that and other media ‘content’ (The New York Times Magazine, 2015, and online: https://www.nytimes.com/2015/05/24/magazine/death-in-the-browser-tab.html, how are we to expand living books’ writerly ‘space’ such that the tabs which living books’ readers/writers painstakingly write into might truly act as social condensers, in line with the more fervent hopes and dreams of ‘radical’ open access? As we sit at those computers, writing, our bodies slumped in chairs and our eyes tired and glazed, should we, can we, seek an experience of elated social dissolution the likes of which I’ve in recent times only seen described by authors contemplating the psychological experience of riots (e.g. Hannah Black, 2022; Tobi Haslett, 2021; Adrian Wohlleben, 2021). It is a vain imagining, probably, but I can’t help but wonder how might try and think of these phenomena together, or at least as potentially related? To me it seems inevitably to point to the fact that we cannot conceive of digital materials outside of the spaces in which they are engaged with. I’ve found Mark Nowak’s Social Poetics (2020) and June Jordan’s Poetry for the People (1995) some of the more helpful sources to think this relationship through (though I realise there are countless others). It also seems telling that they are to a lesser or greater extent centred in interpretations of communal pedagogy.

    1. Author Response

      Reviewer #1 (Public Review):

      Strengths

      This paper is well situated theoretically within the habit learning/OCD literature. Daily training in a motor-learning task, delivered via smartphone, was innovative, ecologically valid and more likely to assay habitual behaviors specifically. Daily training is also more similar to studies with non-humans, making a better link with that literature. The use of a sequential-learning task (cf. tasks that require a single response) is also more ecologically valid. The in-laboratory tests (after the 1 month of training) allowed the researchers to test if the OCD group preferred familiar, but more difficult, sequences over newer, simpler sequences.

      The authors achieved their aims in that two groups of participants (patients with OCD and controls) engaged with the task over the course of 30 days. The repeated nature of the task meant that 'overtraining' was almost certainly established, and automaticity was demonstrated. This allowed the authors to test their hypotheses about habit learning. The results are supportive of the authors' conclusions.

      We truly appreciate the positive assessment of referee 1, particularly the consideration that our study is theoretically strong and that ‘the results are supportive of the authors' conclusions’. This is an important external endorsement of our conclusions, contrasting somewhat with the views of referee 2.

      Weaknesses

      The sample size was relatively small. Some potentially interesting individual differences within the OCD group could have been examined more thoroughly with a bigger sample (e.g., preference for familiar sequences). A larger sample may have allowed the statistical testing of any effects due to medication status.

      The authors were not able to test one criterion of habits, namely resistance to devaluation, due to the nature of the task

      We agree with the reviewer that the proof of principle established in our study opens new avenues for research into the psychological and behavioral determinants of the heterogeneity of this clinical population. However, considering the study timeline and the pandemic constraints, a bigger sample was not possible. Our sample can indeed be considered small if one compares it with current online studies, which do not require in-person/laboratory testing, thus being much easier to recruit and conduct. However, given the nature of our protocol (with 2 demanding test phases, 1-month engagement per participant and the inclusion of OCD patients without comorbidities only) and the fact that this study also involved laboratory testing, we consider our sample size reasonable and comparable to other laboratory studies (typically comprising on average between 30-50 participants in each group).

      This article is likely to be impactful -- the delivery of a task across 30 days to a patient group is innovative and represents a new approach for the study of habit learning that is superior to an inlaboratory approach.

      An interesting aspect of this manuscript is that it prompts a comparison with previous studies of goal-directed/habitual responding in OCD that used devaluation protocols, and which may have had their effects due to deficits in goal-directed behavior and not enhanced habit learning per se.

      Thank you for acknowledging the impact of our study, in particular the unique ability of our task to interrogate the habit system.

      Reviewer #2 (Public Review):

      In this study, the researchers employed a recently developed smartphone application to provide 30 days of training on action sequences to both OCD patients and healthy volunteers. The study tested learning and automaticity-related measures and investigated the effects of several factors on these measures. Upon training completion, the researchers conducted two preference tests comparing a learned and unlearned action sequences under different conditions. While the study provides some interesting findings, I have a few substantial concerns:

      1) Throughout the entire paper, the authors' interpretations and claims revolve around the domain of habits and goal-directed behavior, despite the methods and evidence clearly focusing on motor sequence learning/procedural learning/skill learning. There is no evidence to support this framing and interpretation and thus I find them overreaching and hyperbolic, and I think they should be avoided. Although skills and habits share many characteristics, they are meaningfully distinguishable and should not be conflated or mixed up. Furthermore, if anything, the evidence in this study suggests that participants attained procedural learning, but these actions did not become habitual, as they remained deliberate actions that were not chosen to be performed when they were not in line with participants' current goals.

      We acknowledge that the research on habit learning is a topic of current controversy, especially when it comes to how to induce and measure habits in humans. Therefore, within this context referee’s 2 criticism could be expected. Across disQnct fields of research, different methodologies have been used to measure habits, which represent relaQvely stereotyped and autonomous behavioral sequences enacted in response to a specific sQmulus without consideraQon, at the Qme of iniQaQon of the sequence, of the value of the outcome or any representaQon of the relaQonship that exists between the response and the outcome. Hence these are sQmulus-bound responses which may or may not require the implementaQon of a skill during subsequent performance. Behavioral neuroscienQsts define habits similarly, as sQmulus-response associaQons which are independent of reward or outcome, and use devaluaQon or conQngency degradaQon strategies to probe habits (Dickinson and Weiskrantz, 1985; Tricomi et al., 2009). Others conceptualize habits as a form of procedural memory, along with skills, and use motor sequence learning paradigms to invesQgate and dissect different components of habit learning such as acQon selecQon, execuQon and consolidaQon (Abrahamse et al., 2013; Doyon et al., 2003; Squire et al., 1993). It is also generally agreed that the autonomous nature of habits and the fluid proficiency of skills are both usually achieved with many hours of training or pracQce, respecQvely (Haith and Krakauer, 2018).

      We consider that Balleine and Dezfouli (2019) made an excellent attempt to bring all these different criteria within a single framework, which we have followed. We also consider that our discussion in fact followed a rather cautious approach to interpretation solely in terms of goaldirected versus habitual control.

      Referee 2 does not actually specify criteria by which they define habits and skills, except for asserting that skilled behavior is goal-directed, without mentioning what the actual goal of the implantation of such skill is in the present study: the fulfillment of a habit? We assume that their definition of habit hinges on the effects of devaluation, as a single criterion of habit, but which according to Balleine and Dezfouli (2019) is only 1 of their 4 listed criteria. We carefully addressed this specific criterion in our manuscript: “We were not, however, able to test the fourth criterion, of resistance to devaluation. Therefore, we are unable to firmly conclude that the action sequences are habits rather than, for example, goal-directed skills. Regardless of whether the trained action sequences can be defined as habits or goal-directed motor skills, it has to be considered…”. Therefore, we took due care in our conclusions concerning habits and thus found the referee’s comment misleading and unfair.

      We note that our trained motor sequences did in fact fulfil the other 3 criteria listed by Balleine and Dezfouli (2019), unlike many studies employing only devaluation (e.g. Tricomi et al 2009; Gillan et al 2011). Moreover, we cited a recent study using very similar methodology where the devaluation test was applied and shown to support the habit hypothesis (Gera et al., 2022).

      Whether the initiation of the trained motor sequences in experiment 3 (arbitration) are underpinned by an action-outcome association (or not) has no bearing on whether those sequences were under stimulus-response control after training (experiment 1). Transitions between habitual and goal-directed control over behavior are quite well established in the experimental literature, especially when choice opportunities become available (Bouton et al (2021), Frölich et al (2023), or a new goal-directed schemata is recruited to fulfill a habit (Fouyssac et al, 2022). This switching between habits and goal-directed responding may reflect the coordination of these systems in producing effective behavior in the real world.

      • Fouyssac M, Peña-Oliver Y, Puaud M, Lim NTY, Giuliano C, Everitt BJ, Belin D. (2021).Negative Urgency Exacerbates Relapse to Cocaine Seeking After Abstinence. Biological Psychiatry. doi: 10.1016/j.biopsych.2021.10.009

      • Frölich S, Esmeyer M, Endrass T, Smolka MN and Kiebel SJ (2023) Interaction between habits as action sequences and goal-directed behavior under time pressure. Front. Neurosci. 16:996957. doi: 10.3389/fnins.2022.996957

      • Bouton ME. 2021. Context, attention, and the switch between habit and goal-direction in behavior. Learn Behav 49:349– 362. doi:10.3758/s13420-021-00488-z

      2) Some methodological aspects need more detail and clarification.

      3) There are concerns regarding some of the analyses, which require addressing.

      We thank referee 2 for their detailed review of the methods and analyses of our study and for the helpful feedback, which clearly helps improve our manuscript. We will clarify the methodological aspects in detail and conduct the suggested analysis. Please see below our answers to the specific points raised.

      Introduction:

      4) It is stated that "extensive training of sequential actions would more rapidly engage the 'habit system' as compared to single-action instrumental learning". In an attempt to describe the rationale for this statement the authors describe the concept of action chunking, its benefits and relevance to habits but there is no explanation for why sequential actions would engage the habit system more rapidly than a single-action. Clarifying this would be helpful.

      We agree that there is no evidence that action sequences become habitual more readily than single actions, although action sequences clearly allow ‘chunking’ and thus likely engage neural networks including the putamen which are implicated in habit learning as well as skill. In our revised manuscript we will instead state: “we have recently postulated that extensive training of sequential actions could be a means for rapidly engaging the ‘habit system’ (Robbins et al., 2019)]”

      5) In the Hypothesis section the authors state: “we expected that OCD patients... show enhanced habit attainment through a greater preference for performing familiar app sequences when given the choice to select any other, easier sequence”. I find it particularly difficult to interpret preference for familiar sequences as enhanced habit attainment.

      We agree that choice of the familiar response sequence should not be a necessary criterion for habitual control although choice for a familiar sequence is, in fact, not inconsistent with this hypothesis. In a recent study, Zmigrod et al (2022) found that 'aversion to novelty' was a relevant factor in the subjective measurement of habitual tendencies. It should also be noted that this preference was present in patients with OCD. If one assumes instead, like the referee, that the familiar sequence is goal-directed, then it contravenes the well-known 'egodystonia' of OCD which suggests that such tendencies are not goal-directed.

      To clarify our hypothesis, we will amend the sentence to the following: “Finally, we expected that OCD patients would generally report greater habits, as well as attribute higher intrinsic value to the familiar app sequences manifested by a greater preference for performing them when given the choice to select any other, easier sequence”.

      A few notes on the task description and other task components:

      6) It would be useful to give more details on the task. This includes more details on the time/condition of the gradual removal of visual and auditory stimuli and also on the within practice dynamic structure (i.e., different levels appear in the video).

      These details will be included in the revised manuscript. Thank you for pointing out the need for further clarification of the task design.

      7) Some more information on engagement-related exclusion criteria would be useful (what happened if participants did not use the app for more than one day, how many times were allowed to skip a day etc.).

      This additional information will be added to the revised manuscript. If participants omitted to train for more than 2 days, the researcher would send a reminder to the participant to request to catch up. If the participant would not react accordingly and a third day would be skipped, then the researcher would call to understand the reasons for the lack of engagement and gauge motivation. The participant would be excluded if more than 5 sequential days of training were missed. Only 2 participants were excluded given their lack of engagement.

      8) According to the (very useful) video demonstrating the task and the paper describing the task in detail (Banca et al., 2020), the task seems to include other relevant components that were not mentioned in this paper. I refer to the daily speed test, the daily random switch test, and daily ratings of each sequence's enjoyment and confidence of knowledge.

      If these components were not included in this procedure, then the deviations from the procedure described in the video and Banca al. (2020) should be explicitly mentioned. If these components were included, at least some of them may be relevant, at least in part, to automaticity, habitual action control, formulation of participants' enjoyment from the app etc. I think these components should be mentioned and analyzed (or at least provide an explanation for why it has been decided not to analyze them).

      This is also true for the reward removal (extinction) from the 21st day onwards which is potentially of particular relevance for the research questions.

      The task procedure was indeed the same as detailed in Banca et al., 2020. We did not include these extra components in this current manuscript for reasons of succinctness and because the manuscript was already rather longer than a common research article, given that we present three different, though highly inter-dependent, experiments in order to answer key interrelated questions in an optimal manner. However, since referee 2 considers this additional analysis to be important, we will be happy to include it in the supplementary material of the revised manuscript.

      Training engagement analysis:

      9)I find referring to the number of trials including successful and unsuccessful trials as representing participants "commitment to training" (e.g. in Figure legend 2b) potentially inadequate. Given that participants need at least 20 successful trials to complete each practice, more errors would lead to more trials. Therefore, I think this measure may mostly represent weaker performance (of the OCD patients as shown in Figure 2b). Therefore, I find the number of performed practice runs, as used in Figure 2a (which should be perfectly aligned with the number of successful trials), a "clean" and proper measure of engagement/commitment to training.

      We acknowledge referee’s concern on this matter and agree to replace the y-axis variable of Figure 2b to the number of performed practices (thus aligning with Figure 2a). This amendment will remove any potential effect of weaker performance on the engagement measurement and will provide clearer results.

      10) Also, to provide stronger support for the claim about different diurnal training patterns (as presented in Figure 2c and the text) between patients and healthy individuals, it would be beneficial to conduct a statistical test comparing the two distributions. If the results of this test are not significant, I suggest emphasizing that this is a descriptive finding.

      We will conduct the statistical test and report accordingly.

      Learning results:

      11) When describing the Learning results (p10) I think it would be useful to provide the descriptive stats for the MT0 parameter (as done above for the other two parameters).

      Thank you for pointing this out. The descriptive stats for MT0 will be added to the revised version of the manuscript.

      12) Sensitivity of sequence duration and IKI consistency (C) to reward:

      I think it is important to add details on how incorrect trials were handled when calculating ∆MT (or C) and ∆R, specifically in cases where the trial preceding a successful trial was unsuccessful. If incorrect trials were simply ignored, this may not adequately represent trial-by-trial changes, particularly when testing the effect of a trial's outcome on performance change in the next trial.

      This is an important question. Our analysis protocol was designed to ensure that incorrect trials do not contaminate or confound the results. To estimate the trial-to-trial difference in ∆MT (or C) and ∆R, we exclusively included pairs of contiguous trials where participants achieved correct performance and received feedback scores for both trials. For example, if a participant made a performance error on trial 23, we did not include ∆R or ∆MT estimates for the pairs of trials 23-22 and 24-23. Instead of excluding incorrect trials from our analyses, we retained them in our time series but assigned them a NaN (not a number) value in Matlab. As a result, ∆R and ∆MT was not defined for those two pairs of trials. Similarly for C. This approach ensured that our analyses are not confounded by incremental or decremental feedback scores between noncontiguous trials. In the past, when assessing the timing of correct actions during skilled sequence performance, we also considered events that were preceded and followed by correct actions. This excluded effects such as post-error slowing from contaminating our results (Herrojo Ruiz et al., 2009, 2019). Therefore, we do not believe that any further reanalysis is required.

      • Ruiz MH, Jabusch HC, Altenmüller E. Detecting wrong notes in advance: neuronal correlates of error monitoring in pianists. Cerebral cortex. 2009 Nov 1;19(11):2625-39.

      • Bury G, García-Huéscar M, Bhattacharya J, Ruiz MH. Cardiac afferent activity modulates early neural signature of error detection during skilled performance. NeuroImage. 2019 Oct 1;199:704-17.

      13) I have a serious concern with respect to how the sensitivity of sequence duration to reward is framed and analyzed. Since reward is proportional to performance, a reduction in reward essentially indicates a trial with poor performance, and thus even regression to the mean (along with a floor effect in performance [asymptote]) could explain the observed effects. It is possible that even occasional poor performance could lead to a participant demonstrating this effect, potentially regardless of the reward. Accordingly, the reduced improvement in performance following a reward decrease as a function of training length described in Figure 5b legend may reflect training-induced increased performance that leaves less room for improvement after poor trials, which are no longer as poor as before. To address this concern, controlling for performance (e.g., by taking into consideration the baseline MT for the previous trial) may be helpful. If the authors can conduct such an analysis and still show the observed effect, it would establish the validity of their findings."

      Thank you for raising this point. Figure 5b illustrates two distinct effects of reward changes on behavioral adaptation, which are expected based on previous research.

      I. Practice effects: Firstly, we observe that as participants progress across bins of practice, the degree of improvement in behavior (reflected by faster movement time, MT) following a decrease in reward (∆R−) diminishes, consistent with our expectations based on previous work. Conversely, we found that ∆MT does not change across bins of practices following an increase in reward (∆R+). We appreciate the reviewer's suggestion regarding controlling for the reference movement time (MT) in the previous trial when examining the practice effect in the p(∆T|∆R−) and p(∆T|∆R+) distributions. In the revised manuscript, we will conduct the proposed control analysis to better understand whether the sensitivity of MT to score decrements changes across practice when normalising MT to the reference level on each trial. But see below for a preliminary control analysis.

      II. Asymmetry of the effect of ∆R− and ∆R+ on performance: Figure 5b also depicts the distinct impact of score increments and decrements on behavioural changes. When aggregating data across practice bins, we consistently observed that the centre of the p(∆T|∆R−) distribution was smaller (more negative) than that of p(∆T|∆R+). This suggests that participants exhibited a greater acceleration following a drop in scores compared to a relative score increase, and this effect persisted throughout the practice sessions. Importantly, this enhanced sensitivity to losses or negative feedback (or relative drops in scores) aligns with previous research findings (Galea et al., 2015; Pekny et al., 2014; van Mastrigt et al., 2020).

      We have conducted a preliminary control analysis to exclude the potential impact that reference movement time (MT) values could have on our analysis. We have assessed the asymmetry between behavioural responses to ∆R− and ∆R+ using the following analysis: We estimated the proportion of trials in which participants exhibited speed-up (∆T < 0) or slow-down (∆T > 0) behaviour following ∆R− and ∆R+ across different practice bins (bins 1 to 4). By discretising the series of behavioural changes (∆T) into binary values (+1 for slowing down, -1 for speeding up), we can assess the type of changes (speed-up, slow-down) without the absolute ∆T or T values contributing to our results. We obtained several key findings:

      • Consistent with expectations (sanity check), participants exhibited more instances of speeding up than slowing down across all reward conditions.

      • Participants demonstrated a higher frequency of speeding up following ∆R− compared to ∆R+, and this asymmetry persisted throughout the practice sessions (greater proportion of -1 events than +1 events). 53% events were speed-up events in the in the p(∆T|∆R+) distribution for the first bin of practices, and 55% for the last bin. Regarding p(∆T|∆R-), there were 63% speed-up events throughout each bin of practices, with this proportion exhibiting no change over time.

      • Accordingly, the asymmetry of reward changes on behavioural adaptations, as revealed by this analysis, remained consistent across the practice bins.

      Thus, these preliminary findings provide an initial response to referee 2 and offer valuable insights into the asymmetrical effects of positive/negative reward changes on behavioural adaptations. We plan to include these results in the revised manuscript, as well as the full control analysis suggested by the referee. We will further expand upon their interpretation and implications.

      14) Another way to support the claim of reward change directionality effects on performance (rather than performance on performance), at least to some extent, would be to analyze the data from the last 10 days of the training, during which no rewards were given (pretending for analysis purposes that the reward was calculated and presented to participants). If the effect persists, it is less unlikely that the effect in question can be attributed to the reward dynamics.

      The reviewer’s concern is addressed in the previous quesQon. Also, this analysis would not be possible because our Gaussian fit analyses use the Qme series of conQnuous reward scores, in which ∆R− or ∆R+ are embedded. These events cannot be analyzed once reward feedback is removed because we do not have behavioral events following ∆R− or ∆R+ anymore.

      15) This concern is also relevant and should be considered with respect to the sensitivity of IKI consistency (C) to reward. While the relationship between previous reward/performance and future performance in terms of C is of a different structure, the similar potential confounding effects could still be present.

      We will conduct this analysis for the revised manuscript, similarly to the control analysis suggested by referee 2 on MT. Our preliminary control analysis, as explained above, suggests that the fundamental asymmetry in the effect of ∆R+ and ∆R+ on behavioral changes persists when excluding the impact of reference performance values in our Gaussian fit analysis.

      16) Another related question (which is also of general interest) is whether the preferred app sequence (as indicated by the participants for Phase B) was consistently the one that yielded more reward? Was the continuous sequence the preferred one? This might tell something about the effectiveness of the reward in the task.

      We have now conducted this analysis. There is in fact no evidence to conclude that the continuously rewarded sequence was the preferred one. The result shows that 54.5% of HV and 29% of the OCD sample considered the continuous sequence to be their preferred one. Of note, this preference may not necessarily be linked to the trial-by-trial reward sensitive analysis. The latter assesses how learning may be affected by reward. The overall preference may be influenced by many other factors, such as, for example, the aesthetic appeal of particular combinations of finger movements.

      Regarding both experiments 2 and 3:

      17) The change in context in experiment 2 and 3 is substantial and include many different components. These changes should be mentioned in more detail in the Results section before describing the results of experiments 2 and 3.

      Following referee’s advice, we will move these details (currently written in the Methods section) to the Results section, when we introduce Phase B and before describing the results of experiments 2 and 3.

      Experiment 2:

      18) In Experiment 2, the authors sometimes refer to the "explicit preference task" as testing for habitual and goal-seeking sequences. However, I do not think there is any justification for interpreting it as such. The other framings used by the authors - testing whether trained action sequences gain intrinsic/rewarding properties or value, and preference for familiar versus novel action sequences - are more suitable and justified. In support of the point I raised here, assigning intrinsic rewarding properties to the learned sequences and thereby preferring these sequences can be conceptually aligned with goal-directed behavior just as much as it could be with habit.

      We clearly defined the theoretical framing of experiment 2 as a test of whether trained action sequences gain intrinsic value and we are pleased to hear that the referee agrees with this framing. If the referee is referring to the paragraph below (in the Discussion), we actually do acknowledge within this paragraph that a preference for the trained sequences can either be conceptually aligned with a habit OR a goal-directed behavior.

      “On the other hand, we are describing here two potential sources of evidence in favor of enhanced habit formation in OCD. First, OCD patients show a bias towards the previously trained, apparently disadvantageous, action sequences. In terms of the discussion above, this could possibly be reinterpreted as a narrowing of goals in OCD (Robbins et al., 2019) underlying compulsive behavior, in favor of its intrinsic outcomes”

      This narrowing of goals model of OCD refers to a hypothetically transiQonal stage of compulsion development driven by behavior having an abnormally strong, goal-directed nature, typically linked to specific values and concerns.

      If the referee is referring to the penulQmate sentence of hypothesis secQon, this has been amended in response to Q5. We cannot find any other possible instances in this manuscript stating that experiment 2 is a test of habitual or goal-directed behavior.

      Experiment 3:

      19) Similar to Experiment 2, I find the framing of arbitration between goal-directed/habitual behavior in Experiment 3 inadequate and unjustified. The results of the experiment suggest that participants were primarily goal-directed and there is no evidence to support the idea that this reevaluation led participants to switch from habitual to goal-directed behavior.

      Also, given the explicit choice of the sequence to perform participants had to make prior to performing it, it is reasonable to assume that this experiment mainly tested bias towards familiar sequence/stimulus and/or towards intrinsic reward associated with the sequence in value-based decision making.

      This comment is aligned with (and follows) the referee’s criticism of experiment 1 not achieving automatic and habitual actions. We have addressed this matter above, in response 1 to Referee 2.

      Mobile-app performance effect on symptomatology: exploratory analyses:

      20) Maybe it would be worth testing if the patients with improved symptomatology (that contribute some of their symptom improvement to the app) also chose to play more during the training stage.

      We have conducted analysis to address this relevant question. There is no correlation between the YBOCS score change and the number of total practices, meaning that the patients who improved symptomatology post training did not necessarily chose to play the app more during the training stage (rs = 0.25, p = 0.15). Additionally, we have statistically compared the improvers (patients with reduced YBOCS scores post-training) and the non-improvers (patients with unchanged or increased YBOCS scores post-training) in their number of app completed practices during the training phase and no differences were observed (U = 169, p = 0.19).

      Discussion:

      21) Based on my earlier comments highlighting the inadequacy and mis-framing of the work in terms of habit and goal-directed behavior, I suggest that the discussion section be substantially revised to reflect these concerns.

      We do not agree that the work is either "inadequate or mis-framed" and will not therefore be substantially revising the Discussion. We will however clarify further the interpretation we have made and make explicit the alternative viewpoint of the referee. For example, we will retitle experiment 3 as “Re-evaluation of the learned action sequence: possible test of goal/habit arbitration” to acknowledge the referee’s viewpoint as well as our own interpretation.

      22) In the sentence "Nevertheless, OCD patients disadvantageously preferred the previously trained/familiar action sequence under certain conditions" the term "disadvantageously" is not necessarily accurate. While there was potentially more effort required, considering the possible presence of intrinsic reward and chunking, this preference may not necessarily be disadvantageous. Therefore, a more cautious and accurate phrasing that better reflects the associated results would be useful.

      We recognize that the term "disadvantageously" may be semantically ambiguous for some readers and therefore we will remove it.

      Materials and Methods:

      23) The authors mention: "The novel sequence (in condition 3) was a 6-move sequence of similar complexity and difficulty as the app sequences, but only learned on the day, before starting this task (therefore, not overtrained)." - for the sake of completeness, more details on the pre-training done on that day would be useful.

      Details of the learning procedure of the novel sequence (in condition 3, experiment 3) will be provided in the methods of the revised version of the manuscript.

      Minor comments:

      24) In the section discussing the sensitivity of sequence duration to reward, the authors state that they only analyzed continuous reward trials because "a larger number of trials in each subsample were available to fit the Gaussian distributions, due to feedback being provided on all trials." However, feedback was also provided on all trials in the variable reward condition, even though the reward was not necessarily aligned with participants' performance. Therefore, it may be beneficial to rephrase this statement for clarity.

      We will follow this referee’s advice and will rephrase the sentence for clarity.

      25) With regard to experiment 2 (Preference for familiar versus novel action sequences) in the following statement "A positive correlation between COHS and the app sequence choice (Pearson r = 0.36, p = 0.005) further showed that those participants with greater habitual tendencies had a greater propensity to prefer the trained app sequence under this condition." I find the use of the word "further" here potentially misleading.

      The word "further" will be removed.

    2. Reviewer #2 (Public Review):

      In this study, the researchers employed a recently developed smartphone application to provide 30 days of training on action sequences to both OCD patients and healthy volunteers. The study tested learning and automaticity-related measures and investigated the effects of several factors on these measures. Upon training completion, the researchers conducted two preference tests comparing a learned and unlearned action sequences under different conditions. While the study provides some interesting findings, I have a few substantial concerns:

      1. Throughout the entire paper, the authors' interpretations and claims revolve around the domain of habits and goal-directed behavior, despite the methods and evidence clearly focusing on motor sequence learning/procedural learning/skill learning. There is no evidence to support this framing and interpretation and thus I find them overreaching and hyperbolic, and I think they should be avoided. Although skills and habits share many characteristics, they are meaningfully distinguishable and should not be conflated or mixed up. Furthermore, if anything, the evidence in this study suggests that participants attained procedural learning, but these actions did not become habitual, as they remained deliberate actions that were not chosen to be performed when they were not in line with participants' current goals.<br /> 2. Some methodological aspects need more detail and clarification.<br /> 3. There are concerns regarding some of the analyses, which require addressing.

      Please see details below, ordered by the paper sections.

      Introduction:<br /> It is stated that "extensive training of sequential actions would more rapidly engage the 'habit system' as compared to single-action instrumental learning". In an attempt to describe the rationale for this statement the authors describe the concept of action chunking, its benefits and relevance to habits but there is no explanation for why sequential actions would engage the habit system more rapidly than a single-action. Clarifying this would be helpful.

      In the Hypothesis section the authors state: "we expected that OCD patients... show enhanced habit attainment through a greater preference for performing familiar app sequences when given the choice to select any other, easier sequence." I find it particularly difficult to interpret preference for familiar sequences as enhanced habit attainment.

      A few notes on the task description and other task components:<br /> It would be useful to give more details on the task. This includes more details on the time/condition of the gradual removal of visual and auditory stimuli and also on the within practice dynamic structure (i.e., different levels appear in the video).

      Some more information on engagement-related exclusion criteria would be useful (what happened if participants did not use the app for more than one day, how many times were allowed to skip a day etc.).

      According to the (very useful) video demonstrating the task and the paper describing the task in detail (Banca et al., 2020), the task seems to include other relevant components that were not mentioned in this paper. I refer to the daily speed test, the daily random switch test, and daily ratings of each sequence's enjoyment and confidence of knowledge.<br /> If these components were not included in this procedure, then the deviations from the procedure described in the video and Banca al. (2020) should be explicitly mentioned. If these components were included, at least some of them may be relevant, at least in part, to automaticity, habitual action control, formulation of participants' enjoyment from the app etc. I think these components should be mentioned and analyzed (or at least provide an explanation for why it has been decided not to analyze them).<br /> This is also true for the reward removal (extinction) from the 21st day onwards which is potentially of particular relevance for the research questions.

      Training engagement analysis:<br /> I find referring to the number of trials including successful and unsuccessful trials as representing participants "commitment to training" (e.g. in Figure legend 2b) potentially inadequate. Given that participants need at least 20 successful trials to complete each practice, more errors would lead to more trials. Therefore, I think this measure may mostly represent weaker performance (of the OCD patients as shown in Figure 2b). Therefore, I find the number of performed practice runs, as used in Figure 2a (which should be perfectly aligned with the number of successful trials), a "clean" and proper measure of engagement/commitment to training.

      Also, to provide stronger support for the claim about different diurnal training patterns (as presented in Figure 2c and the text) between patients and healthy individuals, it would be beneficial to conduct a statistical test comparing the two distributions. If the results of this test are not significant, I suggest emphasizing that this is a descriptive finding.

      Learning results:<br /> When describing the Learning results (p10) I think it would be useful to provide the descriptive stats for the MT0 parameter (as done above for the other two parameters).

      Sensitivity of sequence duration and IKI consistency (C) to reward:<br /> I think it is important to add details on how incorrect trials were handled when calculating ∆MT (or C) and ∆R, specifically in cases where the trial preceding a successful trial was unsuccessful. If incorrect trials were simply ignored, this may not adequately represent trial-by-trial changes, particularly when testing the effect of a trial's outcome on performance change in the next trial.

      I have a serious concern with respect to how the sensitivity of sequence duration to reward is framed and analyzed. Since reward is proportional to performance, a reduction in reward essentially indicates a trial with poor performance, and thus even regression to the mean (along with a floor effect in performance [asymptote]) could explain the observed effects. It is possible that even occasional poor performance could lead to a participant demonstrating this effect, potentially regardless of the reward. Accordingly, the reduced improvement in performance following a reward decrease as a function of training length described in Figure 5b legend may reflect training-induced increased performance that leaves less room for improvement after poor trials, which are no longer as poor as before. To address this concern, controlling for performance (e.g., by taking into consideration the baseline MT for the previous trial) may be helpful. If the authors can conduct such an analysis and still show the observed effect, it would establish the validity of their findings."<br /> Another way to support the claim of reward change directionality effects on performance (rather than performance on performance), at least to some extent, would be to analyze the data from the last 10 days of the training, during which no rewards were given (pretending for analysis purposes that the reward was calculated and presented to participants). If the effect persists, it is less unlikely that the effect in question can be attributed to the reward dynamics.<br /> This concern is also relevant and should be considered with respect to the Sensitivity of IKI consistency (C) to reward (even though the relationship between previous reward/performance and future performance in terms of C is of a different structure).<br /> This concern is also relevant and should be considered with respect to the sensitivity of IKI consistency (C) to reward. While the relationship between previous reward/performance and future performance in terms of C is of a different structure, the similar potential confounding effects could still be present.

      Another related question (which is also of general interest) is whether the preferred app sequence (as indicated by the participants for Phase B) was consistently the one that yielded more reward? Was the continuous sequence the preferred one? This might tell something about the effectiveness of the reward in the task.

      Regarding both experiments 2 and 3:<br /> The change in context in experiment 2 and 3 is substantial and include many different components. These changes should be mentioned in more detail in the Results section before describing the results of experiments 2 and 3.

      Experiment 2:<br /> In Experiment 2, the authors sometimes refer to the "explicit preference task" as testing for habitual and goal-seeking sequences. However, I do not think there is any justification for interpreting it as such. The other framings used by the authors - testing whether trained action sequences gain intrinsic/rewarding properties or value, and preference for familiar versus novel action sequences - are more suitable and justified. In support of the point I raised here, assigning intrinsic rewarding properties to the learned sequences and thereby preferring these sequences can be conceptually aligned with goal-directed behavior just as much as it could be with habit.

      Experiment 3:<br /> Similar to Experiment 2, I find the framing of arbitration between goal-directed/habitual behavior in Experiment 3 inadequate and unjustified. The results of the experiment suggest that participants were primarily goal-directed and there is no evidence to support the idea that this re-evaluation led participants to switch from habitual to goal-directed behavior.<br /> Also, given the explicit choice of the sequence to perform participants had to make prior to performing it, it is reasonable to assume that this experiment mainly tested bias towards familiar sequence/stimulus and/or towards intrinsic reward associated with the sequence in value-based decision making.

      Mobile-app performance effect on symptomatology: exploratory analyses:<br /> Maybe it would be worth testing if the patients with improved symptomatology (that contribute some of their symptom improvement to the app) also chose to play more during the training stage.

      Discussion:<br /> Based on my earlier comments highlighting the inadequacy and mis-framing of the work in terms of habit and goal-directed behavior, I suggest that the discussion section be substantially revised to reflect these concerns.

      In the sentence "Nevertheless, OCD patients disadvantageously preferred the previously trained/familiar action sequence under certain conditions" the term "disadvantageously" is not necessarily accurate. While there was potentially more effort required, considering the possible presence of intrinsic reward and chunking, this preference may not necessarily be disadvantageous. Therefore, a more cautious and accurate phrasing that better reflects the associated results would be useful.

      Materials and Methods:<br /> The authors mention: "The novel sequence (in condition 3) was a 6-move sequence of similar complexity and difficulty as the app sequences, but only learned on the day, before starting this task (therefore, not overtrained)." - for the sake of completeness, more details on the pre-training done on that day would be useful.

      Minor comments:<br /> In the section discussing the sensitivity of sequence duration to reward, the authors state that they only analyzed continuous reward trials because "a larger number of trials in each subsample were available to fit the Gaussian distributions, due to feedback being provided on all trials." However, feedback was also provided on all trials in the variable reward condition, even though the reward was not necessarily aligned with participants' performance. Therefore, it may be beneficial to rephrase this statement for clarity.

      With regard to experiment 2 (Preference for familiar versus novel action sequences) in the following statement "A positive correlation between COHS and the app sequence choice (Pearson r = 0.36, p = 0.005) further showed that those participants with greater habitual tendencies had a greater propensity to prefer the trained app sequence under this condition." I find the use of the word "further" here potentially misleading.

    1. Reviewer #2 (Public Review):

      Olszyński et al. claim that they identified a "new-type" ultrasonic vocalization around 44 kHz that occurs in response to prolonged fear conditioning (using foot-shocks of relatively high intensity, i.e. 1 mA) in rats. Typically, negative 22-kHz calls and positive 50-kHz calls are distinguished in rats, commonly by using a frequency threshold of 30 or 32 kHz. Olszyński et al. now observed so-called "44-kHz" calls in a substantial number of subjects exposed to 10 tone-shock pairings, yet call emission rate was low (according to Fig. 1G around 15%, according to the result text around 7.5%). They also performed playback experiments and concluded that "the responses to 44-kHz aversive calls presented from the speaker were either similar to 22-kHz vocalizations or in-between responses to 22-kHz and 50-kHz playbacks".

      Strengths: Detailed spectrographic analysis of a substantial data set of ultrasonic vocalizations recorded during prolonged fear conditioning, combined with playback experiments.

      Weaknesses: I see a number of major weaknesses.

      While the descriptive approach applied is useful, the findings have only focused importance and scope, given the low prevalence of "44 kHz" calls and limited attempts made to systematically manipulate factors that lead to their emission. In fact, the data presented appear to be derived from reanalyses of previously conducted studies in most cases and the main claims are only partially supported. While reading the manuscript, I got the impression that the data presented here are linked to two or three previously published studies (Olszyński et al., 2020, 2021, 2023). This is important to emphasize for two reasons: 1) It is often difficult (if not impossible) to link the reported data to the different experiments conducted before (and the individual experimental conditions therein). While reanalyzing previously collected data can lead to important insight, it is important to describe in a clear and transparent manner what data were obtained in what experiment (and more specifically, in what exact experimental condition) to allow appropriate interpretation of the data. For example, it is said that in the "trace fear conditioning experiment" both single- and group-housed rats were included, yet I was not able to tell what data were obtained in single- versus group-housed rats. This may sound like a side aspect, however, in my view this is not a side aspect given the fact that ultrasonic vocalizations are used for communication and communication is affected by the social housing conditions. 2) In at least two of the previously published manuscripts (Olszyński et al., 2021, 2023), emission of ultrasonic vocalizations was analyzed (Figure S1 in Olszyński et al., 2021, and Fig. 1 in Olszyński et al., 2023). This includes detailed spectrographic analyses covering the frequency range between 20 and 100 kHz, i.e. including the frequency range, where the "new-type" ultrasonic vocalization, now named "44 kHz" call, occurs, as reflected in the examples provided in Fig. 1 of Olszyński et al. (2023). In the materials and methods there, it was said: "USV were assigned to one of three categories: 50-kHz (mean peak frequency, MPF >32 kHz), short 22-kHz (MPF of 18-32 kHz, <0.3 s duration), long 22-kHz (MPF of 18-32 kHz, >0.3 s duration)". Does that mean that the "44 kHz" calls were previously included in the count for 50-kHz calls? Or were 44 kHz calls (intentionally?) left out? What does that mean for the interpretation of the previously published data? What does that mean for the current data set? In my view, there is a lack of transparency here.

      Moreover, whether the newly identified call type is indeed novel is questionable, as also mentioned by the authors in their discussion section. While they wrote in the introduction that "high-pitch (>32 kHz), long and monotonous ultrasonic vocalizations have not yet been described", they wrote in the discussion that "long (or not that long (Biały et al., 2019)), frequency-stable high-pitch vocalizations have been reported before (e.g. Sales, 1979; Shimoju et al., 2020), notably as caused by intense cholinergic stimulation (Brudzynski and Bihari, 1990) or higher shock-dose fear conditioning (Wöhr et al., 2005)" (and I wish to add that to my knowledge this list provided by the authors is incomplete). Therefore, I believe, the strong claims made in abstract ("we are the first to describe a new-type..."), introduction ("have not yet been described"), and results ("new calls") are not justified.

      In general, the manuscript is not well written/ not well organized, the description of the methods is insufficient, and it is often difficult (if not impossible) to link the reported data to the experiments/ experimental conditions described in the materials and methods section. For example, I miss a clear presentation of basic information: 1) How many rats emitted "44 kHz" calls (in total, per experiment, and importantly, also per experimental condition, i.e. single- versus group-housed)? 2) Out of the ones emitting "44 kHz" calls, what was the prevalence of "44 kHz" calls (relative to 22- and 50-kHz calls, e.g. shown as percentage)? 3) How did this ratio differ between experiments and experimental conditions? 4) Was there a link to freezing? Freezing was apparently analyzed before (Olszyński et al., 2021, 2023) and it would be important to see whether there is a correlation between "44-kHz" calls and freezing. Moreover, it would be important to know what behavior the rats are displaying while such "44-kHz" calls are emitted? (Note: Even not all 22-kHz calls are synced to freezing.) All this could help to substantiate the currently highly speculative claims made in the discussion section ("frequency increases with an increase in arousal" and "it could be argued that our prolonged fear conditioning increased the arousal of the rats with no change in the valence of the aversive stimuli"). Such more detailed analyses are also important to rule out the possibility that the "new-type" ultrasonic vocalization, the so-called "44 kHz" call, is simply associated with movement/ thorax compression.

      The figures currently included are purely descriptive in most cases - and many of them are just examples of individual rats (e.g. majority of Fig. 1, all of Fig. 2 to my understanding, with the exception of the time course, which in case of D is only a subset of rats ("only rats that emitted 44-kHz calls in at least seven ITI are plotted" - is there any rationale for this criterion?)), or, in fact, just representative spectrograms of calls (all of Fig. 3, with the exception of G, all of Fig. 4). Moreover, the differences between Fig. 5 and Fig. 6 are not clear to me. It seems Fig. 5B is included three times - what is the benefit of including the same figure three times? A systematic comparison of experimental conditions is limited to Fig. 7 and Fig. 8, the figures depicting the playback results (which led to the conclusion that "the responses to 44-kHz aversive calls presented from the speaker were either similar to 22-kHz vocalizations or in-between responses to 22-kHz and 50-kHz playbacks", although it remains unclear to me why differences were seen b e f o r e the experimental manipulation, i.e. the different playback types in Fig. 8B).

      Related to that, I miss a clear presentation of relevant methodological aspects: 1) Why were some rats single-housed but not the others? 2) Is the experimental design of the playback study not confounded? It is said that "one group (n = 13) heard 50-kHz appetitive vocalization playback while the other (n = 16) 22-kHz and 44-kHz aversive calls". How can one compare "44 kHz" calls to 22- and 50-kHz calls when "44 kHz" calls are presented together with 22-kHz calls but not 50-kHz calls? What about carry-over effects? Hearing one type of call most likely affects the response to the other type of call. It appears likely that rats are a bit more anxious after hearing aversive 22-kHz calls, for example. Therefore, it would not be very surprising to see that the response to "44 kHz" calls is more similar to 22-kHz calls than 50-kHz calls. Of note, in case of the other playback experiment it is just said that rats "received appetitive and aversive ultrasonic vocalization playback" but it remains unclear whether "44 kHz" calls are seen as appetitive or aversive. Later it says that "rats were presented with two 10-s-long playback sets of either 22-kHz or 44-kHz calls, followed by one 50-kHz modulated call 10-s set and another two playback sets of either 44-kHz or 22-kHz calls not previously heard" (and wonder what data set was included in the figures and how - pooled?). Again, I am worried about carry-over effects here. This does not seem to be an experimental design that allows to compare the response to the three main call types in an unbiased manner. Of note, what exactly is meant by "control rats" in the context of fear conditioning is also not clear to me. One can think of many different controls in a fear conditioning experiment. More concrete information is needed.

    1. Reviewer #2 (Public Review):

      Theta-nested gamma oscillations (TNGO) play an important role in hippocampal memory and cognitive processes and are disrupted in pathology. Deep brain stimulation has been shown to affect memory encoding. To investigate the effect of pulsed CA1 neurostimulation on hippocampal TNGO the authors coupled a physiologically realistic model of the hippocampus comprising EC, DG, CA1, and CA3 subfields with an abstract theta oscillator model of the medial septum (MS). Pathology was modeled as weakened theta input from the MS to EC simulating MS neurodegeneration known to occur in Alzheimer's disease. The authors show that if the input from the MS to EC is strong (the healthy state) the model autonomously generates TNGO in all hippocampal subfields while a single neurostimulation pulse has the effect of resetting the TNGO phase. When the MS input strength is weaker the network is quiescent but the authors find that a single CA1 neurostimulation pulse can switch it into the persistent TNGO state, provided the neurostimulation pulse is applied at the peak of the EC theta. If the MS theta oscillator model is supplemented by an additional phase-reset mechanism a single CA1 neurostimulation pulse applied at the trough of EC theta also produces the same effect. If the MS input to EC is weaker still, only a short burst of TNGO is generated by a single neurostimulation pulse. The authors investigate the physiological origin of this burst and find it results from an interplay of CAN and M currents in the CA1 excitatory cells. In this case, the authors find that TNGO can only be rescued by a theta frequency train of CA1 pulses applied at the peak of the EC theta or again at either the peak or trough if the MS oscillator model is supplemented by the phase-reset mechanism.

      The main strength of this model is its use of a fairly physiologically detailed model of the hippocampus. The cells are single-compartment models but do include multiple ion channels and are spatially arranged in accordance with the hippocampal structure. This allows the understanding of how ion channels (possibly modifiable by pharmacological agents) interact with system-level oscillations and neurostimulation. The model also includes all the main hippocampal subfields. The other strength is its attention to an important topic, which may be relevant for dementia treatment or prevention, which few modeling studies have addressed.

      The work has several weaknesses. First, while investigations of hippocampal neurostimulation are important there are few experimental studies from which one could judge the validity of the model findings. All its findings are therefore predictions. It would be much more convincing to first show the model is able to reproduce some measured empirical neurostimulation effect before proceeding to make predictions. Second, the model is very specific. Or if its behavior is to be considered general it has not been explained why. For example, the model shows bistability between quiescence and TNGO, however what aspect of the model underlies this, be it some particular network structure or particular ion channel, for example, is not addressed. Similarly for the various phase reset behaviors that are found. We may wonder whether a different hippocampal model of TNGO, of which there are many published (for example [1-6]) would show the same effect under neurostimulation. This seems very unlikely and indeed the quiescent state itself shown by this model seems quite artificial. Some indication that particular ion channels, CAN and M are relevant is briefly provided and the work would be much improved by examining this aspect in more detail. In summary, the work would benefit from an intuitive analysis of the basic model ingredients underlying its neurostimulation response properties. Third, while the model is fairly realistic, considerable important factors are not included and in fact, there are much more detailed hippocampal models out there (for example [5,6]). In particular, it includes only excitatory cells and a single type of inhibitory cell. This is particularly important since there are many models and experimental studies where specific cell types, for example, OLM and VIP cells, are strongly implicated in TNGO. Other missing ingredients one may think might have a strong impact on model response to neurostimulation (in particular stimulation trains) include the well-known short-term plasticity between different hippocampal cell types and active dendritic properties. Fourth the MS model seems somewhat unsupported. It is modeled as a set of coupled oscillators that synchronize. However, there is also a phase reset mechanism included. This mechanism is important because it underlies several of the phase reset behaviors shown by the full model. However, it is not derived from experimental phase response curves of septal neurons of which there is no direct measurement. The work would benefit from the use of a more biologically validated MS model.

      [1] Hyafil A, Giraud AL, Fontolan L, Gutkin B. Neural cross-frequency coupling: connecting architectures, mechanisms, and functions. Trends in neurosciences. 2015 Nov 1;38(11):725-40.

      [2] Tort AB, Rotstein HG, Dugladze T, Gloveli T, Kopell NJ. On the formation of gamma-coherent cell assemblies by oriens lacunosum-moleculare interneurons in the hippocampus. Proceedings of the National Academy of Sciences. 2007 Aug 14;104(33):13490-5.

      [3] Neymotin SA, Lazarewicz MT, Sherif M, Contreras D, Finkel LH, Lytton WW. Ketamine disrupts theta modulation of gamma in a computer model of hippocampus. Journal of Neuroscience. 2011 Aug 10;31(32):11733-43.

      [4] Ponzi A, Dura-Bernal S, Migliore M. Theta-gamma phase-amplitude coupling in a hippocampal CA1 microcircuit. PLOS Computational Biology. 2023 Mar 23;19(3):e1010942.

      [5] Bezaire MJ, Raikov I, Burk K, Vyas D, Soltesz I. Interneuronal mechanisms of hippocampal theta oscillations in a full-scale model of the rodent CA1 circuit. Elife. 2016 Dec 23;5:e18566.

      [6] Chatzikalymniou AP, Gumus M, Skinner FK. Linking minimal and detailed models of CA1 microcircuits reveals how theta rhythms emerge and their frequencies controlled. Hippocampus. 2021 Sep;31(9):982-1002.

    1. Author Response

      eLife assessment

      This study assesses homeostatic plasticity mechanisms driven by inhibitory GABAergic synapses in cultured cortical neurons. The authors report that up- or down-regulation of GABAergic synaptic strength, rather than excitatory glutamatergic synaptic strength, is critical for homeostatic regulation of neuronal firing rates. The reviewers noted that the findings are potentially important, but they also raised questions. In particular, the evidence supporting the findings is currently incomplete and demonstration of independent regulation of mEPSCs and mIPSCs is a necessary experiment to support the major claims of the study.

      We appreciate the detailed, thoughtful assessment of our paper by the reviewers and editors and will submit a revised version in the future that addresses the reviewers’ comments as detailed below in response to each concern. We will include a more open discussion of alternative possibilities. Further, we will repeat the optogenetic experiments assessing AMPAergic scaling in our mouse cortical cultures in order to demonstrate independent regulation of mEPSCs and mIPSCs as suggested.

      Reviewer #1 (Public Review):

      In the manuscript titled "GABAergic synaptic scaling is triggered by changes in spiking activity rather than transmitter receptor activation," the authors present an investigation of the role of GABAergic synaptic scaling in the maintenance of spike rates in networks of cultured neurons. Their main findings suggest that GABAergic scaling exhibits features consistent with a key homeostatic mechanism that contributes to the stability of neuronal firing rates. Their data demonstrate that GABAergic scaling is multiplicative and emerges when postsynaptic spike rates are altered. Finally, their data suggest that, in contrast to their prior data on glutamatergic scaling, GABAergic scaling is driven by spike rates. The authors set the paper up as an argument that GABAergic scaling, rather than glutamatergic scaling, serves as the critical homeostatic mechanism for spike rate regulation.

      While the paper is ambitious in its rhetorical scope and certainly presents intriguing findings, there are several serious concerns that need to be addressed to substantiate the interpretations of the data. For example, the CTZ data do not support the interpretations and conclusions drawn by the authors. Summarily, the authors argue that GABAergic scaling is measuring spiking (at the time scale of the homeostatic response, which they suggest is a key feature of a homeostat) yet their data in figure 5B show more convincingly that CTZ does not influence spiking levels - only one out of four time points is marginally significant (also, I suspect that the bootstrapping method mentioned in line 454-459 was conducted as a pairwise comparison of distributions. There is no mention of multiple comparisons corrections, and I have to assume that the significance at 3h would disappear with correction).

      We certainly understand the criticism here (similar to reviewer 2’s third point). In our resubmission we will do a better job discussing these complications, which we now summarize. First, we are presenting our entire dataset to be as transparent as possible. Unlike most synaptic scaling studies (including our own) that apply drugs to alter activity and assess mPSC amplitude at the final time point, here we are actually showing CTZ’s effect on spiking activity within the culture over time. This is critical because it has informed us of the drug’s true effect on spiking, the variability that is associated with these perturbations, and the ability and timing of the cultured network to homeostatically recover initial levels. This was important because it revealed that the drugs do not always influence activity in the way we assume, and this provides greater context to our results. Second, we are showing all of our data, and presenting it using estimation statistics which go beyond the dichotomy of a simple p value yes or no (Ho J, Tumkaya T, Aryal S, Choi H, Claridge-Chang A. 2019. Moving beyond P values: data analysis with estimation graphics. Nat Methods 16: 565-66). Estimation statistics have become a more standard statistical approach in the last 15 years and is the preferred method for the Society for Neuroscience’s eNeuro Journal. This method shows the effect size and the confidence interval of the distribution. For the 3 hr time point in Fig. 5B the CTZ/ethanol vs. ethanol data points exhibit very little overlap and the effect size demonstrates a near doubling of spike frequency, and the confidence interval shows a clear separation from 0. This was a pairwise comparison as we compared values at each time point after the addition of ethanol or ethanol/CTZ. Third, the plots illustrate an upward trend in spike frequency at 1 and 6 hrs, but that there is also clear variability. It is important to note that while these recordings help us to understand effects on spiking across the cultured network, they cannot directly speak to spiking activity in the principal neurons that we target. This complication along with the variability inherent in these cultures could make simple comparisons difficult to interpret. Regardless, we do see some increase in spiking with CTZ and we clearly see increases in mIPSC amplitude, thus providing some support for the idea that spiking could be a critical player in terms of GABAergic scaling, particularly when put in the context of our other findings. However, it is important to recognize that something other than total spike rate may contribute to GABAergic scaling, such as the pattern of spiking that produces a particular calcium transient, and this will be discussed in the resubmission.

      Then, the fact that TTX applied on top of CTZ drives a increase in mIPSC amplitude is interpreted as a conclusive demonstration that GABAergic scaling is sensing spiking. It is inevitable, however, that TTX will also severely reduce AMAP-R activation - a very plausible alternative explanation is that the augmentation of AMPAR activation caused by CTZ is not sufficient to overcome the dramatic impact of TTX. All together, these data do not provide substantial evidence for the conclusion drawn by the authors.

      We understand this point when considering the CTZ/TTX experiments by themselves. However, spiking appears to be a more straightforward trigger when the CTZ/TTX results are coupled with the prevention of GABAergic downscaling by optogenetic restoration of spiking in the presence of AMPAR antagonists. Further, an important point here is that our results with TTX vs. TTX + CTZ are different for GABAergic scaling (no difference) and AMPAergic scaling (CTZ diminished upward scaling) suggesting different triggers for the two forms of scaling. We will make this more clear in our resubmission.

      Specific points:

      • The logic of the basis for the argument is somewhat flawed: A homeostat does not require a multiplicative mechanism, nor does it even need to be synaptic. Membrane excitability is a locus of homeostatic regulation of firing, for example. In addition, synapse-specific modulation can also be homeostatic. The only requirement of the homeostat is that its deployment subserves the stabilization of a biological parameter (e.g., firing rate).

      We agree with the reviewer and should not have suggested that this was a necessary requirement for a spike rate hemostat. What we should have said was that historically this definition has been attributed to AMPAergic scaling, which is thought to be a spike rate homeostat. We will correct this in the resubmission.

      • Line 63 parenthetically references an important, but contradictory study as a brief "however". Given the tone of the writing, it would be more balanced to give this study at least a full sentence of exposition.

      Agreed, we will do this.

      • The authors state (line 11) that expression of a hyperpolarizing conductance did not trigger scaling. More recent work ('Homeostatic synaptic scaling establishes the specificity of an associative memory') does this via expression of DREADDs and finds robust scaling.

      The purpose of citing this study was to argue that the spike rate homeostat hypothesis doesn’t make sense for AMPAergic scaling based on a study that hyperpolarized an individual cell while leaving the rest of the network unaltered and therefore leaving network activity and neurotransmission largely normal. In this case scaling was not triggered, suggesting reduced spike rate within an individual cell was insufficient to trigger scaling. The study that the reviewer refers to hyperpolarizes a majority of cells in the network and therefore will also alter neurotransmission throughout the network, which does not separate the importance of spiking and receptor activation as in the above-mentioned study. We will make this point more clearly in the resubmission.

      • Supplemental figure 1 looks largely linear to me? Out of curiosity, wouldn't you expect the left end to be aberrant because scaling up should theoretically increase the strength of some synapses that would have been previously below threshold for detection?

      We agree that the scaling ratio plot is largely linear. To be clear, the linearity of the ratio plot was interesting but our main point here was that this line had a positive slope meaning ratios (CNQX mPSC amplitudes/control mPSC amplitudes) got bigger for the larger CNQX-treated mPSCs. Alternatively, a multiplicative relationship where mPSCs are all increased by a single factor (e.g. 2X) would be a flat line with 0 slope at the multiplicative value (e.g. 2). In terms of the left side of the plot, we do see values that rise abruptly from 1 - this is partially obstructed by the Y axis in this figure and we will adjust this. This left part of the plot is likely due the CNQX-induced increases in mPSC amplitudes of mini’s that were below our detection threshold of 5pA. Therefore, mini’s that were 4pAs could now be 5pAs after CNQX treatment and these are then divided by the smallest control mPSCs which are 5 pAs (ratio of 1). We will try to do a better job describing this in the resubmission.

      Given that figure 2B also shows warping at the tail ends of similar distributions, how is this to be interpreted?

      The left side of the ratio plot shows evidence consistent with the idea that mIPSCs are dropping into the noise after CNQX treatment (similar to above argument), while most of the distribution suggests mIPSCs are reduced to 50% by CNQX treatment. On the right side of the ratio plot the values appear to mostly increase. We are not sure why this is happening, but it looks like some mIPSCs are not purely multiplicative at 0.5, particularly in TTX. It is also important to point out that this is a relatively small percent of the total population and the biggest mPSCs can vary to a great degree from one cell to the next. We will discuss this in the resubmission.

      • The readability of the figures is poor. Some of them have inconsistent boundary boxes, bizarre axes, text that appears skewed as if the figures were quickly thrown together and stretched to fit.

      We will address these issues in the resubmission.

      • I'm concerned about the optogenetic restoration of activity experiment. Cortical pyramidal neuron mean firing rates are log normally distributed and span multiple orders of magnitude. The stimulation experiments can only address the total firing at a network-level - given than a network level "mean" is meaningless in a lognormal distribution, how are we to think about the effect of this manipulation when it comes to individual neurons homeostatically stabilizing their own activities? In essence, the argument is made at the single-neuron level, but the experiment is conducted with a network-level resolution.

      As described above, we do not have the capacity to know what the actual firing rate of a particular neuron was before and after introducing a drug and so we cannot absolutely say that we have restored the original firing rates of neurons. However, there is reason to believe that this is achieved to some extent. Our optogenetic stimulation is only 50-100 ms long activating a subset of neurons. This is sufficient to provide a synaptic barrage that then triggers a full blown network burst where the majority of spikes occur, but this is after the light is off. In other words, the optogenetic light pulse only initiates what becomes a normal network burst that fortunately allows the individual cells to express their relatively normal (pre-drug) activity pattern. In our previous study we show that this is the case for individual units - the spiking of an individual unit during a burst is similar before and after CNQX/optostim (see Figure 4b and Suppl. Fig 4 in Fong et al. 2015 Nat. Comm.). We are not claiming that we have restored spiking to exactly the pre-drug state, but bring it back toward those levels and we see this is associated with a return of the mIPSC amplitude to near control levels. We will include a description of this in the resubmission.

      • Line 198-99: multiplicativity is not a requirement of a homeostatic mechanism.

      • Line 264-265 - again, neither multiplicativity and synaptic mechanisms are fundamentally any more necessary for a homeostatic locus than anything else that can modulate firing rate in via negative feedback.

      Agreed, see above discussion of homeostat requirement. Will adjust these statements in our resubmission.

      • 277: do you mean AMPAR?

      We were not clear enough here. We actually do mean GABAR. The idea is that CTZ increases network activity and thus increases both AMPAergic and GABAergic transmission. We will clarify this in the resubmission.

      • Example: Figure 1A is frustratingly unreadable. The axes on the raster insets are microscopic, the arrows are strangely large, and it seems unnecessary to fill so much realestate with 4 rasters. Only one is necessary to show the concept of a network burst. The effect of time+CNQX on the frequency of burst is shown in B and C.

      • Example: Figure 2 appears warped and hastily assembled. Statistical indications are shown within and outside of bounding boxes. Axes are not aligned. Labels are not aligned. Font sizes are not equal on equivalent axes.

      We will adjust these issues in the resubmission.

      • The discussion should include mention of the limitations and/or constraints of drawing general conclusions from cell culture.

      We agree and will adjust the discussion. Also, this is why we cited studies that argue GABAergic neurons have a particularly important role in homeostatic regulation of firing following sensory deprivations in vivo.

      • The discussion should include mention of the role of developmental age in the expression of specific mechanisms. It is highly likely that what is studied at ~P14 is specific to early postnatal development.

      We will discuss caveats of cortical cultures at DIV 14-20.

      It is essential to ensure that the data presented in the paper adequately supports the conclusions drawn. A more cautious approach in interpreting the results may lead to a stronger argument and a more robust understanding of the underlying mechanisms at play.

      Agreed.

      Reviewer #2 (Public Review):

      Synaptic scaling has long been proposed as a homeostatic mechanism for the regulation for the activity of individual neurons and networks. The question of whether homeostasis is controlled by neuronal spiking or by the activation of specific receptor populations in individual synapses has remained open. In a previous work, the Wenner group had shown that upscaling of glutamatergic transmission is triggered by direct blockade of glutamate receptors rather than by the concomitant reduction in firing rate (Nat Comm 2015). In this manuscript they investigate the mechanisms regulating scaling of GABA-mediated responses in cortical cell cultures using whole-cell recordings to detect GABAergic currents and multielectrode arrays to monitor global firing activity, and find that spiking plays a fundamental role in scaling.

      Initially, the authors show that chronic blockade (24 h) of glutamatergic transmission by CNQX first reduces spontaneous spiking (at 2 h), but later (24 h) firing grows back towards higher frequencies, suggesting a compensatory mechanism. Then it is shown that either chronic CNQX treatment or TTX cause a reduction in the amplitude of GABAergic mIPSCs. Effects of CNQX on IPSCs are then reverted by replacing spontaneous network firing by chronic optogenetic stimulation of the entire culture, also indicating that GABAergic transmission is homeostatically regulated by global firing. Enhancing glutamatergic transmission with CTZ increases mIPSC amplitude, while addition of TTX in the presence of CTZ causes the opposite effect. Finally, increasing spiking activity using bicuculline also increases mIPSC amplitude, and the authors conclude that spiking activity rather than neurotransmission control homeostatic GABA scaling. The manuscript shows interesting properties in the regulation of global GABAergic transmission and highlight the important role of spiking activity in triggering GABA scaling. However, it is strongly recommended to address some caveats in order to better support the conclusions presented in the manuscript.

      Major points:

      1) The reason why CNQX does not completely eliminate spiking is unclear (Fig. 1). What is the circuit mechanism by which spiking continues, although at lower frequency, in the absence of AMPA-mediated transmission and what the mechanism by which spiking frequency grows back after 24h (still in the absence of AMPA transmission)?

      Is it possible that NMDA-mediated transmission takes over and triggers a different type of network plasticity?

      The bursting in AMPAR blockade is due to the remaining NMDA receptor mediated transmission. We showed this in our previous study in Suppl. Figure 2 and 6 of Fong et al., 2015 Nat. Comm.. Our ability to optically induce normal looking bursts of spikes was also dependent NMDAR activation. Further, in Dr Fong’s PhD dissertation it was shown that the bursting activity was abolished when AMPA and NMDA receptors were both blocked. There are likely many factors that contribute to the recovery of activity, and certainly one of them is likely to be the weakening of inhibitory GABAergic currents. These points will be discussed in the resubmission.

      2) A possible activation of NMDARs should be considered. One would think that experiments involving chronic glutamatergic blockade could have been conducted in the presence of NMDAR blockers. Why this was not the case?

      Unfortunately, it was not possible to optogenetically restore normal bursting in the presence of NMDAR blockade (even when AMPAergic transmission was intact), as NMDARs appeared to be critical for the optical restoration of the normal duration of the burst (see Suppl. Figure 6 Fong et al., 2015 Nat. Comm). The reviewer raises an excellent point about a possible NMDAR contribution to altered synaptic strength, however. It is likely that NMDAR signaling is reduced in the presence of CNQX since burst frequency was reduced along with AMPAR-mediated depolarizations. We cannot rule out the possibility that NMDAR signaling could contribute to the alterations in GABAergic mIPSCs and will discuss this in the resubmission. However, previous work suggests that 24/48 hour block NMDARs (APV) did not trigger AMPAergic scaling in cortical or hippocampal cultures (see Figure 1 Turrigiano et al., 1998 Nature and Suppl. Figure 4 Sutton et al., 2006 Cell), moreover, our previous study showed that restoring NMDAergic transmission optogentically, at least to some point, had no influence on AMPAergic scaling (Fong et al., 2015, Nat. Comm.). Regardless, we cannot rule out a role for NMDAergic transmission in GABAergic scaling and this discussion will be included in the resubmission.

      Also, experiments with global ChR2 stimulation with coincident pre and postsynaptic firing might also activate NMDARs and result in additional effects that should be taken into consideration for the global scaling mechanism.

      To be clear, our optical stimulation was turned off before the vast majority of spiking that occurred in the bursts, which played out in a relatively natural manner (see lower panel of Figure 3B optogenetic stimulation – short duration only at onset of burst – we will make this clearer in resubmission). Therefore, we were unlikely to trigger significant synchronous activation that does not normally occur in network bursts.

      3) Cultures exposed to CTZ to enhance AMPA receptors generated variable results (Fig. 5), somewhat increasing spiking activity in a non-significant manner but, at the same time, strengthening mIPSC amplitude. This result seems to suggest that spiking might be involved in GABAergic scaling, but it does not seem to prove it.Then, addition of TTX that blocked spiking reduced mIPSC amplitude. It was concluded here that the ability of CTZ to enhance GABAergic currents was primarily due to spiking, rather than the increase in AMPA-mediated currents. However, in addition to blocking action potentials, TTX would also prevent activation of AMPARs in the presence of CTZ due to the lack of glutamatergic release. Therefore, under these conditions, an effect of glutamatergic activation on GABAergic scaling cannot be ruled out.

      These concerns were very similar to reviewer 1’s first comments. We will address these issues in the resubmission, but to briefly repeat our responses: We are going a step beyond most scaling studies by assessing MEA-wide firing rate, but this still provides an incomplete picture of the particular cells that we target for patch recordings in terms of their firing before and after a drug. Further, we see considerable variability in effect on firing rate from culture to culture, which we will better recognize in the resubmission. Finally, While the CTZ results are not conclusive, taken together with the optogenetic results we think our results are most consistent with idea that GABAergic scaling is a strong candidate as a spike rate homeostat.

      4) The sample size is not mentioned in any figure. How many cells/culture dishes were used in each condition?

      The individual dots represent either individual cells for mIPSC amplitude or individual cultures in MEA experiments. Number of cultures for figures were: Figure 2 – con = 10, TTX = 3, CNQX = 6, Figure 4 – CNQX = 4, con = 10, CNQX/photostim = 6, Figure 5 – ethanol = 3, CTZ = 3, CTZ + TTX =3, Figure 6 – con = 10, bicuculline = 4. We will include the number of cultures for mIPSC amplitude experiments in the figure legends upon resubmission.

      5) Cortical cultures may typically contain about 5-10% GABAergic interneurons and 90-95 % pyramidal cells. One would think that scaling mechanisms occurring in pyramidal cells and interneurons could be distinct, with different impact on the network. Although for whole-cell recordings the authors selected pyramidal looking cells, which might bias recordings towards excitatory neurons, naked eye selection of recording cells is quite difficult in primary cultures. Some of the variability in mIPSC amplitude values (Fig. 2A for example) might be attributed to the cell type? One could use cultures where interneurons are fluorescently labeled to obtain an accurate representation. The issue of the possible differential effects of scaling in pyramidal cells vs. interneurons and the consequences in the network should be discussed.

      We will include this discussion in the resubmission. Briefly, we chose large cells, which will be predominantly glutamatergic neurons as suggested by the reviewer. Ultimately, even among glutamatergic principal cells there may be variability in the response to drug application. All of these issues could contribute to variability and we will expand our description of the variability in our results, including that based on cellular heterogeneity.

      Reviewer #3 (Public Review):

      This paper concerns whether scaling (or homeostatic synaptic plasticity; HSP) occurs similarly at GABA and Glu synapses and comes to the surprising conclusion that these are regulated separately. This is surprising because these were thought to be co-regulated during HSP and in fact, the major mechanisms thought to underlie downscaling (TTX or CNQX driven), retinoic acid and TNF, have been shown to regulate both GABARs and AMPARs directly. (As a side note, it is unclear that the manipulations used in Josesph and Turrigiano represent HSP, and so might not be relevant). Thus the main result, that GABA HSP is dissociable from Glu HSP, is novel and exciting. This suggests either different mechanisms underlie the two processes, or that under certain conditions, another mechanism is engaged that scales one type of synapse and not the other.

      However, strong claims require strong evidence, and the results presented here only address GABA HSP, relying on previous work from this lab on Glu HSP (Fong, et al., 2015). But the previous experiments were done in rat cultures, while these experiments are done in mice and at somewhat different ages (DIV). Even identical culture systems can drift over time (possibly due to changes in the components of B27 or other media and supplements). Therefore it is necessary to demonstrate in the same system the dissociation. To be convincing, they need to show the mEPSCs for Fig 4, clearly showing the dissociation. Doing the same for Fig 5 would be great, but I think Fig 4 is the key.

      We understand the concern of the reviewer as we do see significant variability within our cultures and they were plated in different places, by different people, in different species (rat vs mouse). Therefore, in the resubmission to strengthen the conclusions we will repeat our optogenetic studies restoring activity in the presence of AMPAergic blockade in our mouse cortical cultures and measuring AMPA mEPSCs to assess scaling.

      The paper also suggests that only receptor function or spiking could control HSP, and therefore if it is not receptor function then it must be spiking. This seems like a false dichotomy; there are of course other options. Details in the data may suggest that spiking is not the (or the only) homeostat, as TTX and CNQX causes identical changes in mIPSC amplitude but have different effects on spiking. Further, in Fig 5, CTZ had a minimal effect on spiking but a large effect on mIPSCs. Similar issues appear in Fig 6, where the induction of increased spiking is highly variable, with many cells showing control levels or lower spiking rates. Yet the synaptic changes are robust, across all cells. Overall, this is not persuasive that spiking is necessarily the homeostat for GABA synapses.

      Together our results argue against AMPAR or GABAR activation as a trigger for GABAergic scaling and that this is different than our results for AMPAergic scaling. These points alone are important to recognize. While changes in spiking do not perfectly follow the changes in GABAergic scaling they do always trend in the right direction. As mentioned above, total spiking activity is only one measure of spiking. It is possible that these drugs alter the pattern of spiking that translates into an altered calcium transient that is important for triggering the plasticity. Again, it is important to note that we are going a step beyond most homeostatic plasticity studies that add a drug and simply assume it is having an effect on spiking (e.g. CNQX was initially thought to completely abolish spiking, but clearly does not). Based on the variability that we observe and the nature of our MEA recordings we cannot precisely determine how the total activity or pattern of activity changes with drug application in the specific cells that we target for whole cell recordings. However, we believe our results are more consistent with our proposal that GABAergic scaling is a strong candidate as a spike rate homeostat. Regardless, in the resubmission we will include a broader discussion about these possibilities, and the reality that there could be multiple homeostatic mechanisms that act to recover spiking activity.

      The paper also suggests that the timing of the GABA changes coincides with the spiking changes, but while they have the time course of the spiking changes and recovery, they only have the 24h time point for synaptic changes. It is impossible to conclude how the time courses align without more data.

      We can only say that by the 24 hour CNQX time point, when overall spiking is recovered, that GABAergic scaling has already occurred. We will state this more clearly in the resubmission.

    2. Reviewer #1 (Public Review):

      In the manuscript titled "GABAergic synaptic scaling is triggered by changes in spiking activity rather than transmitter receptor activation," the authors present an investigation of the role of GABAergic synaptic scaling in the maintenance of spike rates in networks of cultured neurons. Their main findings suggest that GABAergic scaling exhibits features consistent with a key homeostatic mechanism that contributes to the stability of neuronal firing rates. Their data demonstrate that GABAergic scaling is multiplicative and emerges when postsynaptic spike rates are altered. Finally, their data suggest that, in contrast to their prior data on glutamatergic scaling, GABAergic scaling is driven by spike rates. The authors set the paper up as an argument that GABAergic scaling, rather than glutamatergic scaling, serves as the critical homeostatic mechanism for spike rate regulation.

      While the paper is ambitious in its rhetorical scope and certainly presents intriguing findings, there are several serious concerns that need to be addressed to substantiate the interpretations of the data. For example, the CTZ data do not support the interpretations and conclusions drawn by the authors. Summarily, the authors argue that GABAergic scaling is measuring spiking (at the time scale of the homeostatic response, which they suggest is a key feature of a homeostat) yet their data in figure 5B show more convincingly that CTZ does not influence spiking levels - only one out of four time points is marginally significant (also, I suspect that the bootstrapping method mentioned in line 454-459 was conducted as a pairwise comparison of distributions. There is no mention of multiple comparisons corrections, and I have to assume that the significance at 3h would disappear with correction). Then, the fact that TTX applied on top of CTZ drives a increase in mIPSC amplitude is interpreted as a conclusive demonstration that GABAergic scaling is sensing spiking. It is inevitable, however, that TTX will also severely reduce AMAP-R activation - a very plausible alternative explanation is that the augmentation of AMPAR activation caused by CTZ is not sufficient to overcome the dramatic impact of TTX. All together, these data do not provide substantial evidence for the conclusion drawn by the authors.

      Specific points:

      - The logic of the basis for the argument is somewhat flawed: A homeostat does not require a multiplicative mechanism, nor does it even need to be synaptic. Membrane excitability is a locus of homeostatic regulation of firing, for example. In addition, synapse-specific modulation can also be homeostatic. The only requirement of the homeostat is that its deployment subserves the stabilization of a biological parameter (e.g., firing rate).<br /> - Line 63 parenthetically references an important, but contradictory study as a brief "however". Given the tone of the writing, it would be more balanced to give this study at least a full sentence of exposition.<br /> - The authors state (line 11) that expression of a hyperpolarizing conductance did not trigger scaling. More recent work ('Homeostatic synaptic scaling establishes the specificity of an associative memory') does this via expression of DREADDs and finds robust scaling.<br /> - Supplemental figure 1 looks largely linear to me? Out of curiosity, wouldn't you expect the left end to be aberrant because scaling up should theoretically increase the strength of some synapses that would have been previously below threshold for detection? Given that figure 2B also shows warping at the tail ends of similar distributions, how is this to be interpreted?<br /> - The readability of the figures is poor. Some of them have inconsistent boundary boxes, bizarre axes, text that appears skewed as if the figures were quickly thrown together and stretched to fit.<br /> - I'm concerned about the optogenetic restoration of activity experiment. Cortical pyramidal neuron mean firing rates are log normally distributed and span multiple orders of magnitude. The stimulation experiments can only address the total firing at a network-level - given than a network level "mean" is meaningless in a lognormal distribution, how are we to think about the effect of this manipulation when it comes to individual neurons homeostatically stabilizing their own activities? In essence, the argument is made at the single-neuron level, but the experiment is conducted with a network-level resolution.<br /> - Line 198-99: multiplicativity is not a requirement of a homeostatic mechanism.<br /> - Line 264-265 - again, neither multiplicativity and synaptic mechanisms are fundamentally any more necessary for a homeostatic locus than anything else that can modulate firing rate in via negative feedback.<br /> - 277: do you mean AMPAR?<br /> - Example: Figure 1A is frustratingly unreadable. The axes on the raster insets are microscopic, the arrows are strangely large, and it seems unnecessary to fill so much realestate with 4 rasters. Only one is necessary to show the concept of a network burst. The effect of time+CNQX on the frequency of burst is shown in B and C.<br /> - Example: Figure 2 appears warped and hastily assembled. Statistical indications are shown within and outside of bounding boxes. Axes are not aligned. Labels are not aligned. Font sizes are not equal on equivalent axes.<br /> - The discussion should include mention of the limitations and/or constraints of drawing general conclusions from cell culture.<br /> - The discussion should include mention of the role of developmental age in the expression of specific mechanisms. It is highly likely that what is studied at ~P14 is specific to early postnatal development.

      It is essential to ensure that the data presented in the paper adequately supports the conclusions drawn. A more cautious approach in interpreting the results may lead to a stronger argument and a more robust understanding of the underlying mechanisms at play.

    1. Reviewer #1 (Public Review):

      In this paper, Scholz and colleagues introduce a new paradigm aimed to bridge the gap between two domains that rely on hierarchical processing: language and memory. They find that, generally in line with their hypotheses, hierarchical processing is associated with activation in hippocampus (especially anterior), medial prefrontal cortex (mPFC), posterior superior temporal sulcus (pSTS), and inferior frontal gyrus (IFG). They also report that these effects in IFG are particularly strong late in the task, once participants have had a lot of experience and processing is presumably more automatic.

      This work has many strengths. The goal to bridge these literatures by developing a new task is commendable. I appreciate also that the authors separately validated their new task behaviorally by comparing it to another accepted as tapping hierarchical processing. I also liked that the authors were transparent about their hypotheses, and certain analyses like the grid coding one that was planned but did not work out. I do however have a number of concerns about the interpretations of the findings, such as whether some patterns are ambiguous as to the true underlying effects. I also have a number of clarification questions. All concerns are described below.

      1. Broadly, I would like to see the authors provide more information and logic on why hierarchical processing should be associated with a big reduction in univariate activation between P1 and P2-why would this signify item in contexts binding? How does this relate to existing work using other methods (e.g., like animal studies, which seem to make predictions more about representational structures)?

      2. There are many differences between what kind of information participants are processing between Position 1 and Position 2 for the HIER but not ITER conditions, and these may not be related to the hierarchical structure specifically. Related to but I think distinct from some of the limitations mentioned in the Discussion is the fact that in the HIER condition, what is happening cognitively between Position 1 and Position 2 items is more distinct (attending to color for position 1, and shape for position 2), whereas the two positions are equivalent in the ITER condition. This is a bit different from the authors' intended manipulation of hierarchy, because it involves a specific dimension. A stronger design might have been to flip the dimensions with respect to position specifically, to make shape sometimes important for position 1, and color for position 2 (perhaps by counterbalancing across subjects, so half would see the current P1=color and P2=shape rules, and the other half P1=shape and P2=color rules). Another important difference between color and shape is that while color is a simple binary distinction that participants can make based on their preexisting knowledge of red versus green, and to which they can assign a verbal label; whereas, the shape distinction was something novel they acquired during the experiment, has no real-world validity or meaning, and would presumably rely more on visuospatial processing. The shape dimension was also much more variable, I believe. I should say that I do find comfort in a few things - (1) that behavior on this task is correlated with another one that also indexes hierarchy processing, and (2) that the results show regional specificity in a pattern at least not easily explained by this distinction. However, I do think future work will be needed to ask whether it is hierarchy processing per se or rather something to do with the particular cognitive states engaged during each phase in this particular task that is eliciting activation in this set of regions. It would strengthen the paper to discuss this issue directly so readers are alerted to the caveat.

      3. I did not understand what data went into creating the schematic in Figure 2E. First, I think this depiction of a gradient might be easily misinterpreted because it seems to imply that the authors have a higher resolution analysis than they actually do. I believe the data were just analyzed in three subregions of hippocampus - head, body, and tail. Variability within each subregion (as seems to be implied by certain parts of a region being more grey and others more red/orange), is not something that could be assessed in this analysis. For example, why does the medial part of the head seem to be more "unspecific" whereas lateral regions look more HIER Pos1 specific? This type of depiction would only make sense in my mind if the authors had performed something like a voxelwise analysis to determine where specifically the interaction "peaks." I would recommend this visualization be cut or significantly changed to do away with the gradient.

      4. I believe the authors have not reported enough information for us to know that hippocampus involvement indeed does not change with experience. It is interesting that hippocampus in the task x experience ROI analysis shows, if anything, bigger differentiation between the two tasks (numerically) for the late trials. This seems to go against the authors' hypothesis, and a lot of existing data, that hippocampus is preferentially involved in early (vs. late) learning. Given that the key signature in this region, though, is that it differentiates between position 1 and position 2 in HIER but not ITER, and doesn't show a big difference in magnitude across the two tasks, it makes me wonder whether the task x experience interaction collapsing across the two positions makes sense for this region. Did the authors consider a similar task x experience interaction within hippocampus, but additionally considering position? I think there are multiple ways to look at this question (e.g., either looking for a task x experience x position interaction, a task x experience within position 1, a task x position interaction separately in early vs. late portions of the task, or even a position x experience interaction only within the HIER task), and I'm sure the authors would be in a better place to decide on a specific path forward. The same logic might go for mPFC, which shows an interaction but no main effect of task. This relates to claims in the discussion as well, such as that "hippocampus was equally active in early and late trials," but given this analysis is collapsing across the dimension hippocampus (and mPFC) seem to be sensitive to (position), it seems like this could be masking an underlying effect in which hippocampus/mPFC might still be differentially involved early vs. late (i.e., they might show the task x position interaction preferentially during some task phases).

      5. For the IFG regions, the task x experience interaction seems to be driven mainly by change (decrease in activation) for the ITER, rather than change in the HIER. The authors are at times careful to talk about this as "sustained" activity in IFG, which I appreciated, but other times talk about a "relative increase." I am not sure how I feel about that. I see the compelling evidence that there are task differences by experience, and that there is reduction for ITER that is interestingly not present for HIER, but I think I am still feeling uncomfortable with the term "increase" or even "relative increase" for HIER. For example, couldn't it simply be that the ITER task is requiring less processing with experience, whereas the HIER does not (perhaps because it requires more processing to begin with)? i.e., we do not know whether the reduction for ITER is simply a neural signal thing (i.e., activations diminish over time/experience) or a cognitive thing, specific to the ITER task. I think the authors are wanting to interpret the reductions as the former, but perhaps it would be more powerful to demonstrate if there was a baseline task that also showed reductions but for which not much would be expected in the way of cognitive change. Can the authors provide more justification for their choice of terminology (through either more logic or analyses), or if not, simply talk about it as sustained activity for HIER-which is especially interesting in the face of reductions for the ITER task?

      6. Please define what is meant by the term "automaticity" in the introduction. A clearer definition of the concept would make the paper generally easier to follow, and it would also help foreshadow the hypotheses about mPFC activity in the introduction. To this end, it could be useful to elaborate on how learning takes place in this task, how it could foster increasing automaticity, and how automaticity maps onto behaviour (e.g., is it RT decrease alone, which happens for both conditions in this task?) the brain regions discussed.

      7. There was no association between brain and behavior, which the authors interpret as a positive (as therefore task difficulty differences could not explain the effects). However in light of these null findings, it is on the flip side hard to know whether this neural engagement carries any behavioral significance. It seems to me as though the authors' framework makes predictions about brain-behavior correlations that were not tested in the manuscript. For example, I believe the authors asked whether behavior overall was correlated with activation. However, wouldn't the automaticity in IFG explanation for example predict that more engagement or an increase in engagement from early to late should be associated with e.g., faster RTs-not necessarily a relationship overall?

      8. On p. 8, it is stated that "In the hippocampus, this effect is driven by higher betas for the presentation of the first object (H1 > I1) and lower betas for the second object (H2 < I2) when comparing across tasks." Can the authors confirm whether the pairwise comparisons following up on the interaction here are significant, or rather if they are referring to a numerical difference in the betas? It looked like the same (numerically) would be true for mPFC; is there a reason why the same information is not included for the mPFC ROI? Also, might the authors provide more speculation as to why one might see both enhanced and reduced activation for P1 and P2, respectively?

      9. I was expecting some discussion of how hippocampus does not seem to show preferential involvement early, given that its potential role being restricted to early in learning (i.e., during acquisition only) was one of the primary motivators for using this task. As noted in my above comment (#4), I am not quite sure that I think there is evidence that the hippocampal role remains constant over this task, given the analyses provided (i.e., that they did not look at the position effect for early vs. late). However upon further analysis if it does seem to be more stable, and/or if it even increases over experience, the authors might want to talk about that in the Discussion.

      10. The fact that the hierarchies in this paradigm unfolded over time makes them distinct on some level from the hierarchies present in the VRT task that was used to validate the HIER task's hierarchical processing demands. For example, there might be additional computations required to processes these temporally ordered structures, support online maintenance, and so on. It may be worth considering this aspect of the task, and whether/to what extent the results could be related to it, in the paper.

      11. I also have many methodological and analytic clarification questions, which I detail in the recommendations for authors.

    1. It may already be clear that ethical conflict in psychological research is unavoidable. Because there is little, if any, psychological research that is completely risk free, there will almost always be conflict between risks and benefits. Research that is beneficial to one group (e.g., the scientific community) can be harmful to another (e.g., the research participants), creating especially difficult trade-offs. We have also seen that being completely truthful with research participants can make it difficult or impossible to conduct scientifically valid studies on important questions.   Of course, many ethical conflicts are fairly easy to resolve. Nearly everyone would agree that deceiving research participants and then subjecting them to physical harm would not be justified by filling a small gap in the research literature. But many ethical conflicts are not easy to resolve, and competent and well-meaning researchers can disagree about how to resolve them. Consider, for example, an actual study on “personal space” conducted in a public men’s room (Middlemist, Knowles, & Matter, 1976). The researchers secretly observed their participants to see whether it took them longer to begin urinating when there was another man (a confederate of the researchers) at a nearby urinal. While some critics found this to be an unjustified assault on human dignity (Koocher, 1977), the researchers had carefully considered the ethical conflicts, resolved them as best they could, and concluded that the benefits of the research outweighed the risks (Middlemist, Knowles, & Matter, 1977). For example, they had interviewed some preliminary participants and found that none of them was bothered by the fact that they had been observed.   The point here is that although it may not be possible to eliminate ethical conflict completely, it is possible to deal with it in responsible and constructive ways. In general, this means thoroughly and carefully thinking through the ethical issues that are raised, minimizing the risks, and weighing the risks against the benefits. It also means being able to explain one’s ethical decisions to others, seeking feedback on them, and ultimately taking responsibility for them.

      It would be beneficial to speak a bit more of the achievements from an unethical study. For example, we do tests on rats and that's not completely ethical, right? So are there any studies that weren't ethical but we learned a lot from that we could add to the conversation. Was there a benefit to deceiving participants? I think an example of this could make readers analyze is there's a reason some fight ethics boards to do studies that may not be entirely ethical. You could also add that most of the time there is a way to get rid of an unethical part of a study, for example, the study by Lahaut, was there a need to visit people's houses multiple times, or could they have just offered an incentive?

    1. Background Reproducibility of data analysis workflow is a key issue in the field of bioinformatics. Recent computing technologies, such as virtualization, have made it possible to reproduce workflow execution with ease. However, the reproducibility of results is not well discussed; that is, there is no standard way to verify whether the biological interpretation of reproduced results are the same. Therefore, it still remains a challenge to automatically evaluate the reproducibility of results.Results We propose a new metric, a reproducibility scale of workflow execution results, to evaluate the reproducibility of results. This metric is based on the idea of evaluating the reproducibility of results using biological feature values (e.g., number of reads, mapping rate, and variant frequency) representing their biological interpretation. We also implemented a prototype system that automatically evaluates the reproducibility of results using the proposed metric. To demonstrate our approach, we conducted an experiment using workflows used by researchers in real research projects and the use cases that are frequently encountered in the field of bioinformatics.Conclusions Our approach enables automatic evaluation of the reproducibility of results using a fine-grained scale. By introducing our approach, it is possible to evolve from a binary view of whether the results are superficially identical or not to a more graduated view. We believe that our approach will contribute to more informed discussion on reproducibility in bioinformatics.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad031 ) , which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      **Reviewer Stian Soiland-Reyes ** Hi, I am Stian Soiland-Reyes https://orcid.org/0000-0001-9842-9718 and have pledged the Open Peer Review Oath https://doi.org/10.12688/f1000research.5686.2: *

      Principle 1: I will sign my name to my review Principle 2: I will review with integrity Principle 3: I will treat the review as a discourse with you; in particular, I will provide constructive criticism Principle 4: I will be an ambassador for the practice of open science. This review is licensed under a Creative Commons Attribution 4.0 International License

      . --- This article presents a method for comparing reproducibility of computational workflow runs captured as RO-Crates, by calculating a set of genomics metrics ("features") and adding these to the crate's metadata. Overall I find this a valuable contribution and worthy of publication with GigaScience, primarily as a way for users of workflow systems CWL, Nextflow, Cromwell or Snakemake to ensure reproducibility, but also for workflow engine developers who may want to build on this methodology to improve their provenance support. In general the method proposed is sound, however it does have some limitations and inherent assumptions that are not highlighted sufficiently in the current manuscript, particularly concerning the selection of features and the reproducibility of the metrics calculation itself. I have detailed this with some points below that I would like the authors to clarify in a minor revision.

      --- Note - the below questions from GigaScience Reviewer Guidelines mainly relate to data, but I also here interpret them for the software described.

      Q1: Is the rationale for collecting and analyzing the data well defined? The author's workflow executions https://doi.org/10.5281/zenodo.7098337 are based on three 3rdparty bioinformatics workflows. Although they are not particularly "large-scale", they are representative best-practice pipelines in this field (data sizes from 200 MB to 6 GB) and also fairly representative for scalable workflow systems (Nextflow, CWL and WDL) used by bioinformaticians.

      Q2: Is it clear how data was collected and curated? It is not explicit in the text why these particular workflows were selected, beyond being realistic pipelines used in research. I would suggest something like "these workflows have been selected as fairly representative and mature current best-practice for sequencing pipelines, implemented in different but typical workflow systems, and have similar set of genomics features that we can assess for provenance comparison." The workflows have each been cited, but I would appreciate some consistency so that each workflow is cited both by its closest journal article and as their original download sources (e.g. GitHub).

      Q3: Is it clear - and was a statement provided - on how data and analyses tools used in the study can be accessed? Yes, full availability statements have been provided both for data and software, archived on Zenodo for longevity.

      Q4: Are accession numbers given or links provided for data that, as a standard, should be submitted to a community approved public repository? Yes, the tools have been added to https://bio.tools/ -- I don't think it's necessary to further register the data outputs with accession numbers. RRIDs for tools can be considered at a later stage, perhaps only for Sapporo.

      Q5: Is the data and software available in the public domain under a Creative Commons license? Yes, the software and dataset is open source under Apache License, version 2.0. The dataset https://doi.org/10.5281/zenodo.7098337 embeds existing workflows and data, however this is OK as included resources such as the rnaseq Nextflow workflow have compatible licenses (MIT) or are also Apache-licensed. The manuscript has software citations for two of the workflows, but this is missing for the CWL workflow, which is only cited by manuscript (33) (also missing DOI). It is unclear if any of the workflows are registered in https://workflowhub.eu/ but that should primarily be done by their upstream authors. The RO-Crates in https://doi.org/10.5281/zenodo.7098337 don't include any licensing and attribution for the embedded workflows, and its metadata file is misleadingly declaring the crate license as CC0 public domain. While CC0 is appropriate for examples and metadata file itself, the embedded MIT/Apache workflows from third parties can't legally be relicensed in this way and should have their original licenses declared. See https://www.researchobject.org/ro-crate/1.1/contextualentities.html#licensing-access-control-and-copyright I understand these RO-Crates are generated automatically by Sapporo, which does not directly understand licensing, and for documenting the test runs with Sapporo, I think these should not be modified post-execution. Pending further license support by Sapporo, perhaps a manual outer RO-Crate that aggregate these (e.g. adding a direct top-level ro-crate-metadata.json to the Zenodo entry) can provide more correct metadata as well as workflow citations. The authors could add to Discussion some consideration on (lack of) propagation of such metadata for auto-generated crates as part of workflow run provenance. For instance, if a workflow run was initiated from a Workflow Crate https://w3id.org/workflowhub/workflow-ro-crate/ at WorkflowHub, its license, attributions and descriptions could be carried forward to the final Workflow Run Crate provenance together with the Sapporo-calculated features.

      Q6: Are the data sound and well controlled? Yes, the data is sound. The testing on Mac gives null-results, but the authors explain the workflows failed to execute there due to archicectural differences, which is flagged as a valid concern for reproducibility. It may be worth further investigating if this is due to misconfiguration on that particular test machine in which case these columns should be removed.

      Q7: Is the interpretation (Analysis and Discussion) well balanced and supported by the data? The authors' discussion have some implicit assumptions that should be made more clear, together with implications: The Tonkaz tool assumes the workflow execution has already extracted the features and added them to the RO-Crate This assumes the right features have been correctly extracted by each execution Feature extraction also depend on bioinformatics tools that are subject to change/updates Newer versions of Sapporo-service, and in particular any non-Sapporo executors also making Workflow run Crates, may have a different feature selection Being able to fairly compare two workflow runs therefore depends on careful control of the Sapporo executor versions so that they have consistent feature selection This means the reproducibility metrics proposed has a potential reproducibility challenge itself This is not to say that the approach is bad, as the feature extraction is using predictable measures such as counting sequences, rather than heuristics. This means Future Work should point out the need for guidelines on what kind of features should be selected, to ensure they are consistent and reproducible. The set of features also depend on the type of data and class of analysis. As a minimum, the RO-Crate should therefore include provenance of that feature extraction, noting the Sapporo version, and ideally the version of the tools used for that. The authors may want to consider if feature extraction should be a separate workflow (e.g. in CWL), that itself can be subject to the same reproducibility preservation measures, and therefore also can be performed post-execution as part of Tonkaz' comparison or as a curation activity when storing Workflow Run Crates.

      Q8: Are the methods appropriate, well described, and include sufficient details and supporting information to allow others to evaluate and replicate the work? Yes, it was very easy to replicate the Tonkaz analysis of the workflow run crate that is already provided, as it is provided also as a Docker container. The Docker container is provided as part of GitHub releases, and so is not at risk of Docker Hub's automatic deletion. I have not tried installing my own Sapporo service to re-execute the workflow, but detailed installation and run details are provided in the README of both Tonkaz https://github.com/sapporowes/tonkaz#readme and sapporo-service https://github.com/sapporowes/sapporo/blob/main/docs/GettingStarted.md

      Q9: What are the strengths and weaknesses of the methods? The method provided is strong compared to naive checksum-based comparison of workflow outputs, which has been pointed out as a challenge by previous work. The advantage of the feature extraction is that the statistics can be compared directly and any disreprancies can be displayed to the user at a digestible high-level. The disadvantage is that this depends wholy on the selection of features, which must be done carefully to cover the purpose of the particular workflow and its type of data. For instance, a workflow that generates diagrams of sequence alignments could not be sufficiently tested in the suggested approach, as analyzing the diagram for correctness would require tools that may not even exist. Perhaps feature extraction should be a part of the workflow itself, so it can self-determine what is important for its analysis? The current approach also is quite sensitive to output data filenames, so changes in filename would mean features are not compared, even where such files are equivalent. This should be made more explicit in the manuscript, for instance workflows should ensure they don't include timestamps or random identifiers in their filenames. Further work could have a deeper understanding of the workflow structure to compare outputs based on their corresponding FormalParameter in the RO-Crate.

      Q10: Have the authors followed best-practices in reporting standards? Yes, the details provided are at a sufficient detail level, and the authors have re-used the RO-Crate data packaging. The RO-Crates created by Sapporo-service adds several terms for the metrics, which are declared on the @context according to RO-Crate specs https://www.researchobject.org/rocrate/1.1/appendix/jsonld.html#extending-ro-crate However the terms point to GitHub "raw" pages, which are not particularly stable, and may change depending on sapporo versions and GitHub's repository behaviour. I recommend changing the ad-hoc terms to PIDs such as a namespace under https://w3id.org/ or https://purl.org/ so that these terms can be stable semantic artefacts, e.g. submitting them to https://github.com/ResearchObject/ro-terms to register https://w3id.org/ro/terms/sapporo#WorkflowAttachment that can be used instead of https://raw.githubusercontent.com/sapporo-wes/sapporo-service/main/sapporo/roterms.csv#WorkflowAttachment or alternatively https://w3id.org/sapporo#WorkflowAttachment could be set up to redirect to the ro-terms.csv on GitHub. (discussed with the authors at ELIXIR Biohackathon) In doing so you should separate into two namespaces, the general Sapporo terms like "sha512", and the particular genomics feature sets including "totalReads" (e.g. https://w3id.org/datafeatures/genomics#WorkflowAttachment) as the second are a) Not sapporo-specific b) domainspecific. RO-Crate is developing Workflow Run profiles https://www.researchobject.org/workflow-runcrate/profiles/, although these have not been released at time of my review they are now stable, so the authors may want to check https://www.researchobject.org/workflow-runcrate/profiles/workflow_run_crate to ensure "FormalParameter" are declared correctly in the generated RO-Crate as separate entities, linked from the "File" using "exampleOfWork".

      Q11: Can the writing, organization, tables and figures be improved? The language and readability of this article is generally very good. Light copy-editing may improve some of the sentences, e.g. reducing the use of "Thus" phrases.

      Q12: When revisions are requested. See suggestions from above for minor revisions: Make explicit why these 3 workflows where selected (see Q2) Make pipeline software citations consistent in manuscript (see Q2, Q5) Avoid declaring CC0 within generated RO-Crate -- move this to only apply to the ro-cratemetadata.json Add an outer RO-Crate metadata file to Zenodo deposit to carry the correct licenses and pipeline licenses for each of rnaseq_1st.zip, trimming.zip etc. Improve discussion to better reflect limitations of the features and its own reproducibility issues (see Q7, Q9) Consider improvements to the RO-Crate context (see Q10) - this may just be noted as Future Work in the manuscript rather than regenerating the crates In addition: p2: Add citation for claim on file checksums different depending on software versions etc., for instance https://doi.org/10.1145/3186266 p3. "We converted Sapporo's provenance into RO-Crate" -- re-cite (20) as this is the paragraph explaining what it is. p10. Citations 7, 8 are missing authors p10. Citation 15 is now published, replace with https://doi.org/10.1145/3486897 p0. Citations 28, 33 is missing DOI

      Q13: Are there any ethical or competing interests issues you would like to raise? No, the third-party pipelines selected for reproducibility testing are already published and are here represented fairly, and only used as executable methods (as intended by their original authors), which I would say do not need ethical approval.

    1. Background Integration of data from multiple domains can greatly enhance the quality and applicability of knowledge generated in analysis workflows. However, working with health data is challenging, requiring careful preparation in order to support meaningful interpretation and robust results. Ontologies encapsulate relationships between variables that can enrich the semantic content of health datasets to enhance interpretability and inform downstream analyses.Findings We developed an R package for electronic Health Data preparation ‘eHDPrep’, demonstrated upon a multi-modal colorectal cancer dataset (n=661 patients, n=155 variables; Colo-661). eHDPrep offers user-friendly methods for quality control, including internal consistency checking and redundancy removal with information-theoretic variable merging. Semantic enrichment functionality is provided, enabling generation of new informative ‘meta-variables’ according to ontological common ancestry between variables, demonstrated with SNOMED CT and the Gene Ontology in the current study. eHDPrep also facilitates numerical encoding, variable extraction from free-text, completeness analysis and user review of modifications to the dataset.Conclusion eHDPrep provides effective tools to assess and enhance data quality, laying the foundation for robust performance and interpretability in downstream analyses. Application to a multi-modal colorectal cancer dataset resulted in improved data quality, structuring, and robust encoding, as well as enhanced semantic information. We make eHDPrep available as an R package from CRAN [[URL will go here]].

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad030 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer Janna Hastings

      The manuscript describes a toolkit for the automated semantic enrichment and quality control of electronic health data using ontologies. This is a much needed utility that will add value to electronic data sharing and re-use for many different purposes including the development of machine learning for medical applications and personalised medicine. Overall the manuscript is well written and the functionality offered by the toolkit is well thought out and motivated. The internal consistency checks and the use of ontology-based information content to semantically aggregate variables into more informative meta-variables are particularly welcome functions.

      However, I recommend that the description of the tool functionality be clarified in some points, and the evaluation could be strengthened.page 6-7, internal consistency:

      1. How should the user specify semantic dependencies between variable pairs? Would it not be helpful to use a standard format for this specification to enable interoperability and re-use of such specifications?

      2. Should the specification of semantic relationships between variables not be linked to the knowledge from the ontologies? Ontologies are able to represent many different types of logical relationships between classes, which make them ideal for then serving as a standard and interoperable format for specifying this type of constraint. Rules are another promising standard approach for logic-based knowledge representation.

      Page 11, figure 4 a: I think it would be informative for evaluating the operation of the tool if the heatmap of variable missingness after application of the tool could also be illustrated beside the current Fig 4a.

      Page 13, ontology preparation: The paragraph describes what the authors have done to prepare ontologies for use with the tool. Is this preparation procedure also necessary for users to follow when they use the eHDPrep tool? How can alternative ontologies be incorporated (which may be useful for other domains)?Evaluation: The biggest shortcoming of the presented manuscript is that the evaluation is limited to the application of the tool to one dataset and subsequent manual evaluation of the outcome by one group, the study authors.

      The results as presented are positive, but there is a significant risk that the tool performs well on this task, as assessed by these study authors, but then fails to generalise to other tasks and datasets that future users might wish to use it with. To mitigate against this challenge, it would be optimal if somewhat more independent methods could be found for evaluating the performance of the different aspects of the tool. One approach could a rigorous comparison of this tool's performance against the performance of other tools that have similar functionality, e.g. comparison of the semantic aggregation function with other tools that find and recommend MICAs. An alternative approach might be to apply the tool to an additional dataset for which a group outside of the study authors would be prepared to provide an independent evaluation.

    1. Background Eukaryotic gene expression is controlled by cis-regulatory elements (CREs), including promoters and enhancers, which are bound by transcription factors (TFs). Differential expression of TFs and their binding affinity at putative CREs determine tissue- and developmental-specific transcriptional activity. Consolidating genomic data sets can offer further insights into the accessibility of CREs, TF activity, and, thus, gene regulation. However, the integration and analysis of multi-modal data sets are hampered by considerable technical challenges. While methods for highlighting differential TF activity from combined chromatin state data (e.g., ChIP-seq, ATAC-seq, or DNase-seq) and RNA-seq data exist, they do not offer convenient usability, have limited support for large-scale data processing, and provide only minimal functionality for visually interpreting results.Results We developed TF-Prioritizer, an automated pipeline that prioritizes condition-specific TFs from multi-modal data and generates an interactive web report. We demonstrated its potential by identifying known TFs along with their target genes, as well as previously unreported TFs active in lactating mouse mammary glands. Additionally, we studied a variety of ENCODE data sets for cell lines K562 and MCF-7, including twelve histone modification ChIP-seq as well as ATAC-seq and DNase-seq datasets, where we observe and discuss assay-specific differences.Conclusion TF-Prioritizer accepts ATAC-seq, DNase-seq, or ChIP-seq and RNA-seq data as input and identifies TFs with differential activity, thus offering an understanding of genome-wide gene regulation, potential pathogenesis, and therapeutic targets in biomedical research.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad026 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer Kaixuan Luo

      This paper develops a novel pipeline TF-Prioritizer to prioritize condition-specific TFs thorough integrative analysis of histone modification (HM) ChIP-seq and RNA-seq data. The pipeline integrates multiple computational tools: calculate TF binding site affinities and link candidate binding sites to genes using the TRAP and TEPIC. It uses DYNAMITE, a sparse logistic regression classifier, to infer TFs related to differential gene expression between conditions. It computes an aggregated score "TF-TG score" to score TFs from multiple types of evidence, and obtains a prioritized list of TFs from all histone modifications using a discounted cumulative gain ranking approach. It also provides additional functionality and web interface to visualize the results.

      Overall, the pipeline could be very useful for biologists with a user-friendly web application to automate the entire process from data preprocessing to statistical analysis and obtain interactive reports to gain novel biological insights. However, more systematic evaluations are needed to demonstrate the benefits of this pipeline.

      Major comments:

      1. In the computation of an aggregated score "TF-TG score", it uses a multiplicative function to combine differential expression (absolute log2FC), TF-Gene scores computed from TEPIC, and the total coefficients computed from DYNAMITE. One concern about this approach is that it may miss some TFs with support from only one or two types of evidence. In Fig 5, we see diffTF identifies a lot more TFs than diffTF. I don't think we can conclude that diffTF is less specific than TF-Prioritizer simply based on the number of TFs prioritized. Some of the TFs identified only by diffTF may be important but missed by TF-Prioritizer? I would like to see more detailed analysis comparing the lists of TFs identified by diffTF and TF-Prioritizer. Other evidence or metrics in addition to the number of prioritized TFs would be helpful to evaluate the plausibility of the prioritized lists of TFs.

      2. It is hard to interpret and evaluate the contribution of the evidence for prioritized TFs. Figure 6b is helpful, but it is unclear how the users would be able to evaluate the contribution of the components. Does the software run each of the combination separately and outputs a list of prioritized TFs under each combination?

      3. The TEPIC2 paper has already developed a very comprehensive pipeline, including TF affinity calculation by TRAP and computation of TF gene scores by TEPIC, as well as logistic regression to identify TFs between conditions by DYNAMITE, and it is already well paralyzed. The authors should clearly list the novel contributions from this work. It would be helpful to have a table comparing the functionalities and technical features between TF-Prioritizer and TEPIC2.

      4. The software takes histone modification ChIPseq and RNA-seq data as input. It will significantly improve the usage of the software if it supports DNase-seq and/or ATAC-seq, which are widely used. If this software could take ATAC-seq or DNase-seq data as input, it is important to include those data types and provide some examples to illustrate the usage and performance.

      5. The software combines multiple histone modification ChIP-seq datasets using a discounted cumulative gain ranking approach. However, different types of histone modifications have different epigenomic functions and different combinations indicate different chromatin states. Some TFs may be only enriched in a small subset of histone modifications (already discussed by the authors) and may be missed by the simple discounted cumulative gain ranking approach. The authors should provide prioritized TFs from each histone modification ChIP-seq dataset, and evaluate which TFs were prioritized by all the combined datasets, and which TFs by only one dataset. Also, some ChIP-seq datasets may be of poor quality. Does the software provide other options to rank the TFs from different epigenomic datasets? e.g. set different weights for different epigenomic datasets, etc.

      6. The authors conducted cooccurrence analysis based on the overlapping of peaks. It is unclear if the method would calculate some statistical measure (e.g. p-value) for the significance of co-occurrence. Also, since the TRAP model generates quantitative measure of TF binding affinity, I am curious to see if the quantitative TF binding affinity are also correlated for those co-occurred binding sites.

      Minor comments: 1. In Figure 1, it would be helpful to highlight which steps were already implemented in existing tools (and label the tools used), and which steps are novel in this study. 2. H3K4me3 data seems to be missing in the L10 time point. How does the method handle missing data? 3. It is unclear how the Pol2 ChIP-seq data was used in this study? Was it included in the model or only in the downstream analysis? 4. It is hard to interpret the browser tracks of the TF predictions ("Predicted xxx") in Figure 3 and 4. Please add more details about those tracks .5. Figure 6, the authors should provide more details to help understand this figure, especially panel b. The figure legend is too short.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1: Major comments: The key point of the manuscript is to provide resources for the plant community. The motivation for selecting these specific promoters, how they were obtained and cloned, what they are in detail and how they will be made publically available is all clearly described. The infection experiments presented in it are an added bonus and a proof of concept of the applicability of the system.

      Thank you very much.

      Minor comments: The promotor sequences will probably be included in the AddGene submission, however, it might be helpful to also deposit the promoter sequences at e.g. GenBank.

      Indeed, we have sent all sequence files to AddGene and they will be available for download there. We will look into transferring them to GenBank as well. We have not done this before, but are generally always supportive of maintaining data in open repositories.

      Line 133: "There are few exceptions to this rule...". It would probably helpful to list/mark these exceptions in Table 1

      We agree. We have now marked them in the table, and included the sentence “There are a few exceptions to this rule (marked with a * in the ‘Bases’ column in table 2), where we used a defined stretch of DNA that has previously been described to complement a mutant” in lines 135-137.

      Line 138: "A overhangs". In the GreenGate system, A-modules (promoters) are flanked by A- (5') and B- (3') overhangs (applies to line 144, too). Also, the B-overhang listed here (TTGT) is the reverse complement, which might be confusing for readers.

      A very good point. We have modified these lines to “standard four base pair GreenGate promoter module overhangs (5´-ACCT and TTGT-3´) were added via primers during amplification of the promoter sequences (see Supplementary Table 1 for a list of primer sequences. Note that TTGT is the complementary sequence of the A-to-B-module overhang, as this is added via the reverse primer)” in lines 141-144.

      Line 149 ff.: How many lines have been established per promoter tested? Did they all yield a similar expression pattern?

      This is indeed a very important point which was somehow lost along the way during manuscript preparations, after being moved around between results and methods section. We have put it back in in lines 162-165 as “We recovered several independent transgenic lines for the PEP1 and 2, PEPR1 and 2, as well as BIK1 and RBOHD reporters. Out of those, a minimum of three (RBOHD) and up to seven (PEPR2) independent lines showed fluorescence, and out of those, all individual lines for each reporter showed the same expression patterns.”

      Line 163: As someone not being familiar with microscoping Arabidopsis roots, I'm wondering how the authors can be sure that the tissue in question is the vasculature. Is this obvious for experts in the field?

      Of course, we can’t give a totally objective answer here, but we believe that by including the transmitted light image next to the fluorescence image, it is indeed visible that the fluorescence is limited to the center of the root, not the complete circumference. At the same time, it is important to note that all images are stereomicroscopic images, not confocal images. Thus, it is indeed not possible to, e.g., conclude if pericycle cells are included or excluded in the region with expression. So, while it is, we believe, safe to assume that it is vascular cells, we can’t determine which cell types in the vascular cylinder are expressing the reporters. This would require confocal imaging, which would increase the resolution, but at the expense of a good overview, which we think is more valuable for such a proof-of-principle.

      Discussion: Is there by any chance prior (cell-resolution) knowledge about the expression behaviour of any of the investigated promoters? E. g. by in-situ hybridizations? If so, do the expression patterns match?

      No, the expression of these reporters in direct response to fungal infection have so far only been studied by transcriptomics.

      Presentation and quality of the images need be improved. Scale bars are missing in all confocal images. In Figure 3 and 4, the name of genes examined can be labeled on the image, which will make it easier for readers. In addition, key information such as the inoculum and sampling time point after fungal inoculation should be described in the legend or the main text.

      We have added the scale bars and gene names into the images. We agree that the gene names make it easier for the reader. Further, we have added the inoculum and sampling time to the legend.

      More importantly, a "mock" inoculation or "before fungal inoculation" should be performed to reveal the expression changes of the marker genes after fungal inoculation.

      This is information was provided in the text and via the supplemental figures, but I assume we didn’t make it clear that these results and images were indeed specific control/mock experiments, and not some ‘general’ expression analysis. We have now tried to make this clearer, specifically in lines 192-194.

      Lines 172-174, the pictures are too small to see these details. The same for BIK1 (line 187).

      We have split up figure 3 into two separate figures (figures 3 and 4), to allow for them to be displayed larger, so that more details can be observed. Of course, it would also be helpful to do some confocal microscopy on specific regions of interest of these stereomicroscopic images to obtain high-resolution images of these regions, but, unfortunately, we did not reach this point in this project, before our team was disbanded, and we therefore only have the overview images to get a general idea of the responsiveness of the different reporters.

      Line 174-176, which results are these referring to? The same for line 200-203.

      We assume that this was not clear because we previously failed to make it clear that the control supplementary figures are from experimental controls/mock. We have reworded both paragraphs to, hopefully, explain it a bit better, and included the supplementary figure number that refers to. It’s now in lines 212-215 and 237-242.

      This study provides a valuable collection of vectors/constructs for investigation of transcriptional dynamics of plant immunity genes and should attract broad interest of the plant immunity field.

      Thank you very much.

      The current study by Calabria et al., entitled "pGG-PIP: A GreenGate (GG) entry vector collection with Plant Immune system Promoters (PIP)," reported the development of a set of GreenGate-compatible entry plasmids that contain promoter sequences of a series of immunity-related genes. This tool enables live-cell observation of immune responses at a cellular resolution. Being compatible with many other GreenGate tools, it opens up a door toward simultaneous visualization of different but overlapping immune pathways and ultimately describes the 4D dynamics of plant immunity. It is more than expected that these constructs will be used by a wide range of researchers and contribute to the ultimate understanding of plant innate immunity.

      Thank you very much.

      It is exciting that the authors observed the marker expression by a fluorescent stereomicroscope. This allows for non-destructive observation of response over time, keeping the system gnotobiotic. However, it was partly disappointing that the author did not take full advantage of this. It would have been much nicer if the authors observed the infection process over time, such that one could tell when and where the response starts, and whether local and systemic reactions occur simultaneously or instead require local-to-systemic signal transduction. They indeed seem to have done such time-course observation (line 378) however did not provide the results. I am curious to know what the authors could have found from those experiments. It would also be a strong appealing point of this method and is therefore highly encouraged

      We absolutely agree that this temporal data would be valuable and interesting. So far, we always imaged the colonization sites in the root tips from the first day when they become visible, until the day when the entire root was colonized/dying. However, we only recorded the infection sites directly, and did not image the entire plants, and local as well as systemic responses. This is, of course, something that we would have liked to do, and planned to do in the future, but, so far, we have not gotten to that point. We also attempted to use the images of the infection sites that we have recorded over time to obtain information about disease progression, e.g., colonization speed of the fungus, but this data is not (yet) at a point, where we feel confident that we have enough information to draw solid conclusions. So, while we absolutely agree that this kind of whole-plant imaging with both, high spatial and temporal resolution, must be the aim, at this point, unfortunately, we simply are not at that place yet.

      Immune responses are not always induction of expression but sometimes reduction. Some genes up-regulated in the first phase will also be down-regulated afterward in order to go back to the initial non-responding state. During such down-regulation, the expression of a fluorescence marker gene might not accurately reflect the real expression levels, because the translated proteins might stay longer even while its transcription is suppressed. To address this point, it is suggested that the authors observe the marker lines in the presence of a translation inhibitor, such as cycloheximide, and quantitatively analyze the dynamics of protein degradation when no new protein is synthesized.

      This is indeed an excellent point. Unfortunately, we have to first say that due to funding issues we are currently unable to do this experiment. However, we did include two things in the revised manuscript: First, we have put in a note that this is indeed a caveat of the system that must be acknowledged (lines 334-337). Second, we have included some information from a different study, which at least addresses this point to some degree. We have imaged the transcriptional response of the WRKY11 transcription factor in response to colonization by Fo5176, and in this case, we not only see a local upregulation next to the colonization site, but we see a complete switch in expression pattern. As part of this switch, WRKY11 expression, which was expressed in all root tissues and cells in uninfected control experiments, switches expression off in all tissues and cells except the vascular cells close to the infection site. So here, we indeed have a downregulation of the reporter. In these experiments, signal from the fluorescent WRKY11 reporter disappears from the cells within a day. As we imaged once per day, we can, unfortunately not get more specific than this one-day window. The day before colonization of the tip, signal is seen in all tissues, one day later, if/when the vasculature if colonized in the tip, there is no weak/residual fluorescence left in the cells of the outer tissues. So we can at least state that we would probably also detect downregulation of expression, despite the protein lifetime. Importantly, all our imaging is done on a regular stereomicroscope, and thus, camera sensitivity is moderate. I could imagine that we may be able to detect some residual fluorescence with ultra-sensitive cameras at a spinning disc, or a sensitive detector at a laser-scanning microscope, but we have not tested this. We have added this information in lines 337-347. I apologize that we can’t add more information than this.

      It is remarkable that the authors managed to clone 75 promoter sequences. However, whether all promoters work as expected was not clearly assessed in the present study. Did the authors only transform plants with PEP1, PEP2, PEPR1, and PEPR2 marker constructs? How would they know that the other promoters also work appropriately? In terms of providing these constructs to the research community, it is needed to disclose to which extent the expression has been validated in planta and which promoter has not been assessed.

      This is indeed important information. We have not used the promoters in mutant complementation assays, and have added this caveat in lines 348-350.

    1. Reviewer #1 (Public Review):

      This paper provides valuable (and impressive) data on the geometry of cerebellar foliation among 56 species of mammals and gives novel insights into the evolution of cerebellar foliation and its relationship with the anatomy of the cerebrum. Thus far, the majority of the research on brain folding focuses on the cerebral cortex with little research on the cerebellum. The results from Heuer et al confirm that the evolution of the cerebellum and cerebrum follows a concerted fashion across mammals. Moreover, they suggest that both the cerebrum and cerebellum folding are explained by a similar mechanistic process.

      1. Although I found the introduction well written, I think it lacks some information or needs to develop more on some ideas (e.g., differences between the cerebellum and cerebral cortex, and folding patterns of both structures). For example, after stating that "Many aspects of the organization of the cerebellum and cerebrum are, however, very different" (1st paragraph), I think the authors need to develop more on what these differences are. Perhaps just rearranging some of the text/paragraphs will help make it better for a broad audience (e.g., authors could move the next paragraph up, i.e., "While the cx is unique to mammals (...)").

      2. Given that the authors compare the folding patterns between the cerebrum and cerebellum, another point that could be mentioned in the introduction is the fact that the cerebellum is convoluted in every mammalian species (and non-mammalian spp as well) while the cerebrum tends to be convoluted in species with larger brains. Why is that so? Do we know about it (check Van Essen et al., 2018)? I think this is an important point to raise in the introduction and to bring it back into the discussion with the results.

      3. In the results, first paragraph, what do the authors mean by the volume of the medial cerebellum? This needs clarification.

      4. In the results: When the authors mention 'frequency of cerebellar folding', do they mean the degree of folding in the cerebellum? At least in non-mammalian species, many studies have tried to compare the 'degree or frequency of folding' in the cerebellum by different proxies/measurements (see Iwaniuk et al., 2006; Yopak et al., 2007; Lisney et al., 2007; Yopak et al., 2016; Cunha et al., 2022). Perhaps change the phrase in the second paragraph of the result to: "There are no comparative analyses of the frequency of cerebellar folding in mammals, to our knowledge".

      5. Sultan and Braitenberg (1993) measured cerebella that were sagittally sectioned (instead of coronal), right? Do you think this difference in the plane of the section could be one of the reasons explaining different results on folial width between studies? Why does the foliation index calculated by Sultan and Braitenberg (1993) not provide information about folding frequency?

      6. Another point that needs to be clarified is the log transformation of the data. Did the authors use log-transformed data for all types of analyses done in the study? Write this information in the material and methods.

      7. The discussion needs to be expanded. The focus of the paper is on the folding pattern of the cerebellum (among different mammalian species) and its relationship with the anatomy of the cerebrum. Therefore, the discussion on this topic needs to be better developed, in my opinion (especially given the interesting results of this paper). For example, with the findings of this study, what can we say about how the folding of the cerebellum is determined across mammals? The authors found that the folial width, folial perimeter, and thickness of the molecular layer increase at a relatively slow rate across the species studied. Does this mean that these parameters have little influence on the cerebellar folding pattern? What mostly defines the folding patterns of the cerebellum given the results? Is it the interaction between section length and area? Can the authors explain why size does not seem to be a "limiting factor" for the folding of the cerebellum (for example, even relatively small cerebella are folded)? Is that because the 'white matter' core of the cerebellum is relatively small (thus more stress on it)?

      8. One caveat or point to be raised is the fact that the authors use the median of the variables measured for the whole cerebellum (e.g., median width and median perimeter across all folia). Although the cerebellum is highly uniform in its gross internal morphology and circuitry's organization across most vertebrates, there is evidence showing that the cerebellum may be organized in different functional modules. In that way, different regions or folia of the cerebellum would have different olivo-cortico-nuclear circuitries, forming, each one, a single cerebellar zone. Although it is not completely clear how these modules/zones are organized within the cerebellum, I think the authors could acknowledge this at the end of their discussion, and raise potential ideas for future studies (e.g., analyse folding of the cerebellum within the brain structure - vermis vs lateral cerebellum, for example). I think this would be a good way to emphasize the importance of the results of this study and what are the main questions remaining to be answered. For example, the expansion of the lateral cerebellum in mammals is suggested to be linked with the evolution of vocal learning in different clades (see Smaers et al., 2018). An interesting question would be to understand how foliation within the lateral cerebellum varies across mammalian clades and whether this has something to do with the cellular composition or any other aspect of the microanatomy as well as the evolution of different cognitive skills in mammals.

    1. Considerate

      My reflections here build on Lino Pertile’s 2010 essay, ‘L’inferno, il lager, la poesia’. Pertile notes the profound correspondence between the opening poem of the book (OC I, 139) and this chapter. He points out how the main theme of Levi’s book, the dehumanising experience in the Lager, based on the annihilation of people’s identity, is expressed in the poem and resurfaces explicitly again in the chapter dedicated to Dante’s Ulysses. The key term revealing the correspondence of themes and intentions is ‘Considerate [consider]’, used twice in Levi’s poem (‘Consider if this is a man | … | Consider if this is a woman’) and rooted in the memory of Dante’s famous tercet where Ulysses addresses his crew as they sail towards the horizon of their last journey beyond the pillars of Hercules: ‘Considerate la vostra semenza: | fatti non foste a viver come bruti, | ma per seguir virtute e canoscenza’ (Inf. 26, 118-20 and OC I, 228).

      There are many other correspondences between the chapter of Ulysses and the opening poem, besides the ‘Considerate’, and that they are profound and filtered through the theme of memory, an eminently Dantean theme: the urgency to fix in the memory itself what is or will be necessary to tell, or the urgency to express and recount what is deposited in memory. Indeed, for Levi, the memory of each individual person contains that person’s humanity.

      Memory is immediately activated as Primo and Jean exit the underground gas tank (‘He [Jean] climbed out and I followed him, blinking in the brightness of the day. It was warm [tiepido] outside; the sun drew a faint smell of paint and tar from the greasy earth that made me think of [mi ricordava] a summer beach of my childhood'). Temporarily escaping hell by means of a ladder (a sort of Dantesque ‘natural burella’), it is the tiepido sun and a characteristic smell that evoke the childhood memory and that at the same time the reader cannot avoid connecting to the tiepide case of the initial poem (‘You who live safe | in your heated houses [tiepide case]’ [my emphasis]). It is then around the memory ‘of our homes, of Strasbourg and Turin, of the books we had read, of what we had studied, of our mothers’ that another theme in the chapter coalesces, the theme of friendship (‘He and I had been friends for a week’), a theme that had already emerged in a more general connotation in the opening poem (‘visi amici’). Warmth, friendship (visi amici…Jean), the kitchens as destination for Primo and Jean’s walk (the walk from the tank with the empty pot is ‘the ever welcomed opportunity of getting near the kitchens’, not for that hot food [cibo caldo] evoked in the poem, but for the soup of the camp, an alienating incarnation of Dantesque ‘pane altrui’ whose various names are dissonant). During the respite of the one hour walk from the tank to the kitchens, the intermittent memory of Dante’s canto emerges as if from an underground consciousness, the memory of Inferno as a partial and imperfect mirror of the human condition in the Lager, Ulysses as poetic memory, a sudden epiphany of a semenza, a seed, of humanity that the Lager is made to suppress, and Primo’s wondering in the face of this sudden internal revelation of still possessing an intact humanity. Primo’s memory of his home resurfaces as if springing from the memory of Dante’s text: the ‘montagna bruna’ of Purgatory is reflected in the memory of ‘my mountains, which would appear in the evening dusk [nel bruno della sera] when I returned from Milan to Turin!' But the real, familiar landscape is too heartbreaking a memory of ‘sweet things cruelly distant’, one of those hurtful thoughts, ‘things one thinks but does not say’. There is an epiphanic memory then, the poetic memory that surfaces during the walk and that reveals to Primo that he still is a man, a memory to which he clings despite the sense of his own audacity (‘us two, who dare to talk about these things with the soup poles on our shoulders’); there is also a more intimate memory, equally pulsating with life and humanity - but dangerous, because it makes Primo vulnerable to despair, threatening his own survival in the camp.

      The urgent need to remember Dante’s verses in this chapter develops the theme of memory, which has been central from the opening poem. In Levi’s poem, though, memory is perceived from a different angle: the readers (who live safe…) must honour that memory and transmit it as an imperative testimony of what happened in the concentration camp from generation to generation, testifying to the suffering of the man and the woman ‘considered’ in the poem. This is a memory to be carved in one’s heart, which must accompany those who receive it in every action and in every moment of each day like a prayer. Not coincidentally the poem follows the text of the most fundamental prayer of Judaism, the Shemà Israel, which is read twice a day, a memory to be passed on to one’s own children, a responsibility which is a sign of one’s humanity. The commandment to remember of the opening poem (‘I consign these words to you. | Carve them into your hearts') issues a potential curse to the reader, threatening the destruction of what most fundamentally characterises their humanity - home, health, children: ‘Or may your house fall down, | May illness make you helpless, | And your children turn their eyes from you’. Finally, Primo’s act of remembering during the walk to the kitchens is submerged by the Babelic soup (‘Kraut und Rüben…cavoli e rape…Choux et navets…Kàposzta és répak…Until the sea again closed – over us’) and yet the memory of it becomes part of his testimony in such a central chapter of the book written after surviving the Shoah. If the memory of Dante’s verses contributed to Primo’s faith in his own humanity and his psychological and physical survival in the camp, he then accomplishes the commandment of memory and his responsibility as a man through his own writing.

      CS

    2. non lasciarmi pensare alle mie montagne

      Very often, when we think about ‘Il canto di Ulisse’, we tend to recall only the most famous pages in which Levi tries to remember Dante’s canto. The depth and sense of urgency of the Ulyssean passages are so overwhelming and passionate that they may distract us from other elements in the chapter. However, if we go back to the text and read it closely, we cannot avoid noticing that, after a brief opening in which Levi introduces Pikolo and narrates how he came to be Pikolo’s ‘fortunate’ chaperone to collect the soup for the day, ‘Il canto di Ulisse’ also dwells quite significantly on a moment of domestic memories. While going to the kitchens, Levi writes: ‘Si vedevano i Carpazi coperti di neve. Respirai l’aria fresca, mi sentivo insolitamente leggero’. This is the first moment in the chapter in which Levi refers to the mountains as something that revitalises him and makes him feel fresh and light, both physically and mentally.

      This moment foreshadows another, also in this chapter, when Levi goes back to his mountains, those close to Turin, and compares them to the mountain that the protagonist of Dante’s canto, Ulysses, encounters just before his shipwreck with his companions:

      ... Quando mi apparve una montagna, bruna

      Per la distanza, e parvemi alta tanto

      Che mai veduta non ne avevo alcuna.

      Sì, sì, ‘alta tanto’, non ‘molto alta’, proposizione consecutiva. E le montagne, quando si vedono di lontano... le montagne... oh Pikolo, Pikolo, di’ qualcosa, parla, non lasciarmi pensare alle mie montagne, che comparivano nel bruno della sera quando tornavo in treno da Milano a Torino! Basta, bisogna proseguire, queste sono cose che si pensano ma non si dicono. Pikolo attende e mi guarda. Darei la zuppa di oggi per saper saldare ‘non ne avevo alcuna’ col finale.

      The significance of the mountains in Levi’s narration is confirmed in this passage. For him, the mountains represent his experience of belonging, his youthful years, and his work as a chemist – the job he was doing when he commuted by train from Turin to Milan. At the same time, Levi’s own memories of the mountains intertwine and overlap with another mountain, Dante’s Mount Purgatory. Here, a deep and perhaps not fully conscious intertextual game starts to emerge and to characterise Levi’s writing. The lines that Levi does not remember are these (compare, on the Dante page):

      Noi ci allegrammo, e tosto tornò in pianto,

      ché de la nova terra un turbo nacque,

      e percosse del legno il primo canto.

      For Dante’s Ulysses, Mount Purgatory signifies the final moment of his adventure and his desire for knowledge. The marvel and enthusiasm that Ulysses and his company feel when they see the mountain is suddenly transformed into its contrary. From the mountain, a storm originates that will destroy the ship and swallow its crew: ‘Tre volte il fe’ girar con tutte l’acque, | Alla quarta levar la poppa in suso | E la prora ire in giù, come altrui piacque’. Dante’s Mount Purgatory, so majestic and spectacular, represents the end of any desire for knowledge that aims to find new answers to and interpretations of human existence in the world without God’s word.

      Going back to Levi’s text, we find that, instead, in a kind of reverse overlapping between his image and that of Ulysses, the image of the mountain of Purgatory suggests to Levi a very different set of thoughts that, although seemingly and similarly overwhelming, opens up new interpretations: ‘altro ancora, qualcosa di gigantesco che io stesso ho visto ora soltanto, nell’intuizione di un attimo, forse il perché del nostro destino, del nostro essere oggi qui’. For a moment, it is almost as if Levi, a new Dantean Ulysses in a new Inferno, stands in front of Mount Purgatory and forgets the terzine and the shipwreck. Maybe Levi cannot or does not want to remember those terzine because the mountain in Purgatory represents something very different for him than for Dante’s Ulysses. Levi’s view of the mountain does not lead to a moment of recognition of sin, as it does in Dante’s Ulysses. For him, the mountain, like his mountain range, is the gateway to knowledge, enrichment, and illumination and to a world that lies beyond the imposed limits of traditional, constricting, and distorted views and that awaits discovery (‘qualcosa di gigantesco che io stesso ho visto ora soltanto’). Something about and beyond the Lager.

      To better understand how the mountains are central in ‘Il canto di Ulisse’, we have to remember that Levi’s view of the mountains strongly depends on his anti-Fascism, which he expressed particularly vigorously in two moments of his life: during his months in the Resistance, just before he was captured and sent to Fossoli, and, even more intensely, during the adventures of his youth, when he was a free young man who enjoyed climbing the mountains surrounding Turin. As Alberto Papuzzi has suggested, ‘le radici del suo rapporto con la montagna sono ben piantate in quella stagione più lontana: radici intellettuali di cittadino che cercava sulla montagna, nella montagna, suggestioni e risposte che non trovava nella vita, o meglio nell’atmosfera ispessita di quella vita torinese, senza passato e senza futuro’ (OC III, 426-27). Indeed, reports Papuzzi, Levi confirms that:

      Avevo anche provato a quel tempo a scrivere un racconto di montagna […]. C’era tutta l’epica della montagna, e la metafisica dell’alpinismo. La montagna come chiave di tutto. Volevo rappresentare la sensazione che si prova quando si sale avendo di fronte la linea della montagna che chiude l’orizzonte: tu sali, non vedi che questa linea, non vedi altro, poi improvvisamente la valichi, ti trovi dall’altra parte, e in pochi secondi vedi un mondo nuovo, sei in un mondo nuovo. Ecco, avevo cercato di esprimere questo: il valico.

      The heart of that epic story made its way into the chapter ‘Ferro’ in Il sistema periodico. The discovery of this (brave) new world, ‘mondo nuovo’, is an integral part and a direct achievement of Levi’s experience in the mountains. The mountains open a new understanding and a new perspective on the world.

      Something that escapes common understanding is revealed through the experience of the mountains, both in Levi’s memories of his youth and in his literary recounting of Auschwitz. Reciting Dante in ‘Il canto di Ulisse’ is therefore not only an intertextual exercise for Levi. Only by inserting Levi’s literary references in the complexity of his own experience – before, during, and after Auschwitz – can we fully capture the depth of his reflections. Levi mentally and metaphorically brought to Auschwitz not only Dante but also his ‘metafisica dell’alpinismo’. Together, they contributed to his attempt to come to terms with that reality.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      […] Overall, the authors build a convincing case for TEs being an important source of regulatory information. I don't have any issues with the analysis, but I am concerned about the sweeping claims made in the title. Once you get rid of eQTLs that could be altered by either SNPs or TIPs and include only those insertions that show strong evidence of selection, the number of genes is reduced to only 30. And even in those cases, the observed linkage is just that, not definitive evidence for the involvement of TEs. Although clearly beyond the scope of this analysis, transgenic constructs with the TEs present or removed, or even segregating families, would have been far more convincing. 

      We notice that the referee thinks that we "built a convincing case for TEs being an important source of regulatory information". This is what we wanted to convey in the title, were we were cautious to not claiming that TEs are the most important contributor to gene expression variability in rice populations. However, we agree with the referee that the title may be improved to better describe the results presented. We have therefore changed the title to "Transposons are an important contributor to gene expression variability under selection in rice populations".

      With respect to demonstrating causality by removing or introducing the TEs, this is indeed a work we plant to do but that, as stated by the referee, is beyond the scope of this analysis.

      The fact that many of the eQTL-TIPs were relatively old is interesting because it suggests that selection in domesticated rice was on pre-existing variation rather than new insertions. This may strengthen the argument because those older insertions are less likely to be purged due to negative effects on gene expression. Given that the sequence of these TEs is likely to have diverged from others in the same family, it would have been interesting to see if selection in favor of a regulatory function had caused these particular insertions to move away from more typical examples of the family. 

      The TIP-eQTL are from different classes, superfamilies and families and the number of TIP-eQTLs of the same family is too small to deduce sequence communalities (4.6 TIP-eQTLs/family in indica and 3.6 TIP-eQTLs/family in japonica). On the other hand the effect of TIPs on expression can be positive or negative (we show actually that it is often negative). In the later case, a plausible scenario would be of the insertion inactivating a promoter element, and in this case it would be the insertion itself, and not the actual sequence of the TE what would be selected.

      Also, previous work done in our lab has shown that TEs can amplify and mobilize transcription factor binding sites that are bound by the TF even when they are not close to a gene and therefore probably not directly affecting gene expression (Hénaff et al.,2014. The Plant Journal). In that case, the sequence of the eQTL TEs and those that are far away from genes will not necessarily differ. 

      Reviewer #2 (Public Review):

      In this manuscript, Castanera et al. investigated how transposable elements (TEs) altered gene expression in rice and how these changes were selected during the domestication of rice. Using GWAS, the authors found many TE polymorphisms in the proximity of genes to be correlated to distinct gene expression patterns between O. sativa ssp. japonica and O. sativa ssp. indica and between two different growing conditions (wet and drought). Thereby, the authors found some evidence of positive selection on some TE polymorphisms that could have contributed to the evolution of the different rice subspecies. These findings are underlined by some examples, which illustrate how changes in the expression of some specific genes could have been advantageous under different conditions. In this work, the authors manage to show that TEs should not be ignored when investigating the domestication of rise as they could have played an important role in contributing to the genetic diversity that was selected. However, this study stops short of identifying causations as the used method, GWAS, can only identify promising correlations. Nevertheless, this study contributes interesting insights into the role TEs played during the evolution of rice and will be of interest to a broader audience interested in the role TEs played during the evolution of plants in general. 

      We agree with the referee that the results presented do not allow concluding on causality, and we have been careful not to pretend they would in the manuscript. We plan to perform analysis of adding or removing TEs by CRIPR/Cas 9 approaches to address this, but, in line with referee's 1 comment, we think this is beyond the scope of this analysis.

      ---------- 

      Reviewer #1 (Recommendations For The Authors): 

      Everything that I need to say is provided in the public portion of my review. 

      Reviewer #2 (Recommendations For The Authors): 

      Major concerns:

      1. The authors compare the proportion of the variance explained by the most significant TIP and SNP on the observed eQLTs associated with TIPs and SNPs. Thereby the authors conclude that TIPs explain more variance than SNPs. If I am not mistaken the GWAS was run separately for TIPs and SNPs, however, I am wondering if running the GWAS on the combined TIP and SNP dataset might be the better way to compare the variance explained by TIPs and SNPs on gene expression differences. It would be nice to see if these results also hold true if a TIP and SNP combined dataset is used as the most significant marker in a GWAS might not be the causal mutation but might just be linked to the causal mutation. Further in the TIP dataset, the number of markers is only 45k and in the SNP dataset, it is 1 000k, which could bias the GWAS toward finding markers that explain more of the variation in the dataset with fewer markers. 

      We addressed the reviewer concern by using two complementary approaches, whose results are described in the text (lines 119-121) and in the new Figure 1-figure supplement 1.

      First, we addressed the concern regarding the independent GWAS for TIPs and SNPs vs a combined strategy. For this, we built new japonica/indica genotype matrices containing all TIP and SNP matrix together and ran eQTL mapping again. Using the same strategy (association + FDR adjust), we found 100% of the previous TIP-eQTLs and 99% of the previous SNP-eQTLs. We repeated the same analysis (proportion of expression variance), and the results were mostly the same (Figure 1-figure supplement 1A).

      Second, we addressed the two concerns (combined genotypes and different amount of TIP and SNP markers) using a single approach. SNP matrices were LD pruned using a r2 = 0.9 and later subsampled to the exact number of TIPs (Indica = 30,396, Japonica = 25,168). We verified that these SNPs covered well the 12 rice chromosomes. SNP and TIP genotypes were later merged into a single matrix, and eQTL mapping was repeated for each of the subspecies and conditions using the same parameters as in the previous version of the manuscript. 100 % of the previously reported TIP-eQTL associations were found using this new approach. Nevertheless, we found a very important drop of sensitivity in the SNP-eQTLs (only 15-20% of the previous associations were detected), possibly due to the strong reduction in the number of SNPs (> 95 %), which results in much lower number of markers at < 5Kb from genes). We repeated the analysis of Figure 1D, and observed very similar results (Figure 1-figure supplement 1D). There is a very important number of TIP-eQTL associations that do not coincide with SNP-eQTLs, (74% in indica, 83% in japonica) indicating that TIP-eQTL mapping is complementary to SNP-eQTL mapping as it uncovers additional associations (note that in this case the overlap between TIP-eQTLs and SNP-eQTLs is lower than in the previous analysis due to the lower sensitivity of SNP-eQTL mapping using less markers). In the cases were both a TIP and a SNP coincide as eQTL, TIPs explained slightly more variance than SNPs in both indica and japonica (in 54% of the cases TIP variance > SNP variance).

      2. Line 146 to 152: in this section, the authors describe overlaps between TIP-eQTLs in two different growth conditions, however, in the text it is not mentioned if the TIPs have the same effect on gene expression in the two conditions or if the gene expression is up-regulated in one condition but down-regulated in the other. This information would be interesting to have here, especially as the authors go on to say that only a small number of TIP-eQTLs are stress-specific. The same comment also goes for the eQTL overlap described on lines 167 to 170. 

      We checked the effect type (positive or negative) of TIP-eQTLs in both scenarios (associations shared between wet/dry conditions, and associations shared between subspecies). In both cases, 100 % of the shared TIP-eQTLs have the same effect type in the two conditions or subspecies. We have updated the text accordingly (Lines 55-157 and Lines 179-181)

      3. Lines 192 to 196: the authors mention that the frequency of non-eQTL-TIPs was at the same frequency in indica and japonica, which is in contrast to eQTL-TIPs. However, on line 132 it is mentioned that eQTL-TIPs were overrepresented in 1 kb regions upstream of genes. Hence, is the pattern of the frequency of non-eQTL-TIPs being at the same frequency in indica and japonica also observed in the 1 kb regions upstream of genes and/or if the distribution of non-eQTL-TIPs is matched to one of the eQTL-TIPs? Or is this pattern driven by non-eQTL-TIPs far away from genes?

      We checked the frequencies of TIPs at 1Kb upstream genes and found that the general pattern is maintained, with the frequencies of TIP no-eQTLs being more correlated than that of TIP-eQTLs. We have included this information (lines 204-206) an added a new supplementary file (Figure 2-figure supplement 2)

      4. In the discussion, the authors could briefly discuss how linked selection affecting TIPs could contribute to the observed results. After reading the second example in the result section where one of the example TIPs (TIP_50059) is found on the Hap B which contains "some additional structural differences" (line 290), I was left wondering how much of the increase in TIP frequency can be attributed to genetic hitchhiking? And how much of the results could be caused by linked selection, especially when considering that structural variations are not included in the GWAS analyses. 

      We agree with the referee in that some of the TIP eQTLs here described might be not the actual cause of expression variability (ej, TIP linked with the causal mutation), although we cannot know the exact fraction. This is stated in several places of the results and discussion sections. However, the fact that TIPs tend to explain more variance than SNPs and that TIP eQTL, but not SNP eQTL, tend to concentrate in the upstream proximal region of genes where most transcription regulatory sequences are located (Figure 1), suggest that TIP eQTLs could be more frequently the causal than SNP eQTLs. We revised the text to ensure that we convey this message appropriately.

      Minor comments: 

      • Lines 80 to 83: the description of the rice phylogeny should be moved to the introduction. 

      Done (Lines 68-72)

      • Line 177 to 186: It was unclear to me if the authors checked in the ancestral rice population laced the TIPs described in this section as recently inserted in the indica and japonica ssp. It would be nice to add this information to this section. 

      Thanks to the referee comment we noted an imprecision in the text. The approximate 1/3 of subspecies specific TIP-eQTLs refers to the TIPs at 3% MAF (ie, some of these insertions could be present at > 3% in indica, but at < 3% MAF in japonica). We now indicate only the TIPs that are truly specific to any of the two subspecies (frequency is zero in one of the two) and looked for their presence in rufipogon:

      59 insertions are indica-specific. Of those, 33 are present in rufipogon.

      21 insertions are japonica-specific. Of those, 5 are present in rufipogon.

      We have incorporated this information in the manuscript (Lines 185-189). The species-specific TIPs are also available in the Supplementary File 3.

      • Line 353: "have two of more TIPs" should be "two or more" 

      Done (Line 369)

      • Figure 1D: Using a square layout instead of a rectangle layout for the plot will make it easier to interpret. 

      Done.

    1. wide variety of methods for any given project.

      Having a variety of methods to get to a solution is exteremely important as a lot of people can come at a problem with different angles and solve it differently. However, this can breed confusion as to which way is the "right" way and which way is the "wrong way" It also may seem like their way might not work but we should do a though examination from their side to see why they think it might work and maybe square that up with the harsh reality.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      Manuscript number: RC-2023-01932

      Corresponding author(s): Dennis KAPPEI

      We would like to thank all reviewers for their recognition of our approach and the quality of our work as well as their constructive criticism.

      Reviewer #1

      Reviewer #1: The manuscript by Yong et. al. describes a comparison of various chromatin immunoprecipitation-mass spectrometric (ChIP-MS) methods targeting human telomeres in a variety of systems. By comparing antibody-based methods, crosslinkers, dCas9 and sgRNA targeted methods, KO cells and various controls, they provide a useful perspective for readers interested in similar experiments to explore protein-DNA interactions in a locus-specific manner.

      Response: We would like to thank the reviewer for the feedback and the appreciation of our work.

      Reviewer #1: While interesting, I found it somewhat difficult to extract a clear comparison of the methods from the text. It was also difficult to compare as data and findings from each method was discussed in its own context. Perhaps it is not in their interest to single out a specific method and it is indeed true that there are caveats with each of the methods.

      Response: Across our manuscript we have established one single workflow, for which we present some technical comparisons (e.g. using single or double cross-linking in Fig. 2a/b), technical recommendations such as the use of loss-of-function controls (e.g. Fig. 1c v. Fig. 2a and Extended Data Fig. 3g vs. 3i) and an application to unique loci using dCas9 (Fig. 3f). Based on the suggestions below, we believe that we will improve the clarity of communicating our approach.

      Reviewer #1: I think the manuscript would be of interest but I believe that there are remaining questions that need to be addressed before publication. In particular, I found it difficult to reconcile the discrepancy in protein IDs between most experiments vs. the WT/KO experiment in Fig 2. The authors make a big deal about the importance of the KO control but I think the fewer proteins identified there may be experiment-specific and not general to the KO system. I ask that this be investigated more carefully by the authors in their revisions.

      Response: We thank the reviewer for highlighting this point. We do not think that the ChIP-MS comparison between U2OS WT and ZBTB48 KO clones (Fig. 2a) has experiment-specific caveats. Instead the KO controls as well as the dTAGV-1 degron system for MYB ChIP-MS (Extended Data Fig. 3) reveal antibody-specific off-targets, which are indeed false-positives. Please see below for further details.

      Reviewer #1: Ln 57: What is "standard double cross-linking ChIP reactions" in this context? Is it the two different crosslinkers? The two proteins? The reciprocal IPs of one protein, and blotting for another? It's not clear here or from Extended Fig 1A. Upon further reading, it seems to pertain to the two crosslinkers - if so, the authors should briefly describe their workflow to help readers.

      Response: As the reviewer correctly concludes, we indeed intended to highlight the use of two separate crosslinkers (formaldehyde/FA and DSP). This combination is important as illustrated in the side-by-side comparison of Fig. 2a and Fig. 2d. Here, we performed ZBTB48 ChIP-MS in five U2OS WT and five U2OS ZBTB48 KO clones. While in both experiments the bait protein ZBTB48 was abundantly enriched in the samples that were fixed with formaldehyde we lose about half of the telomeric proteins that are known to directly bind to telomeric DNA independent of ZBTB48 and all of their interaction partners. For instance, while the FA+DSP reaction in Fig. 2a enriched all six shelterin complex members, the FA only reaction in Fig. 2d only enriches TERF2. These data suggest that the use of a second cross-linker helps to stabilise protein complexes on chromatin fragments. This is a critical message of our manuscript as ChIP-MS only truly lives up its name if we can enrich proteins that genuinely sit on the same chromatin fragment without protein interactions to the bait protein. We will expand on this in both the text and our schematics in Fig. 1a and 3a to make this clearer for the readers.

      Reviewer #1: Ln 95: It is surprising and quite unclear to me why it is that the WT ZBTB48 U2OS pulldown in Fig 1B shows 83 hits for the WT vs Ig control experiment but 27 hits for the WT vs KO condition in Fig 2A. The two WT experiments have the same design and reagents, shouldn't they be as close as technical replicates and provide very similar hits?

      The authors seem to make the claim that most of the 'extra' proteins in WT vs Ig are abundant and false positives, but if this is so, shouldn't they bind non-specifically to the beads and be enriched equally in Ig control and ZBTB48 WT IPs?

      Response: We again thank the reviewer for raising this point and the need to explain in more detail why we interpret the difference between 83 hits (anti-ZBTB48 antibody vs. IgG; Fig. 1c) and 27 hits (anti-ZBTB48 antibody used in both U2OS WT and ZBTB48 KO cells; Fig. 2a) primarily as false-positives. The KO controls in Fig. 2a allow to keep the ZBTB48 antibody as a constant variable while instead comparing the presence (WT) or absence (KO) of the bait protein. Hence, proteins that were enriched in the IgG comparison in Fig. 1c but that are lost in the WT vs. KO comparison in Fig. 2a are likely directly (or indirectly) recognised by the ZBTB48 antibody, akin to off-targets to this particular reagent. In a Western blot this would be equivalent to seeing multiple bands at different molecular weights with only the band belonging to the protein-of-interest disappearing in KO cells. To illustrate this we would like to refer to Extended Data Fig. 2, in which we have replotted the exact same data from Fig. 2a. However, in addition we have here highlighted proteins that were enriched in the IgG comparison in Fig. 1c. 46 proteins (in pink) are indeed quantified in the WT vs. KO comparison, but these proteins are found below the cut-offs (and most of them with very poor fold changes and p-values). In contrast to the other several hundred proteins common between both experiments that can be considered common background non-specifically bound to the protein G beads, these 46 proteins represent antibody-specific false-positives.

      The above consideration is not unique to ChIP-MS as illustrated by the Western blot example. We also do not claim novelty on the experimental logic, e.g. pre-CRISPR in 2006 Selbach and Mann demonstrated the usefulness of RNAi controls in immunoprecipitations (IPs) (PMID: 17072306). However, our data suggests that ChIP-MS is particularly vulnerable to this type of false-positives given that the approach requires (double-)cross-linking to sufficiently stabilise true-positives on the same chromatin fragment.

      To supplement the WT vs. ZBTB48 KO comparison, we had included a second experiment in the manuscript that illustrates the same point in even more dramatic fashion. First, KO controls are very clean in principle, but they themselves might come with caveats if e.g. the expression levels between WT and KO samples differ greatly. This might create a situation that the reviewer hinted to, i.e. differential expression of abundant proteins that would proportionally to their expression levels stick to the beads, resulting in “fold enrichments”. The resulting false positives could e.g. be controlled by matched expression proteomes. For ZBTB48 we have previously measured this (PMID: 28500257) and demonstrated that only a small number of genes are differentially expressed (~10) and hence we can interpret the WT vs. ZBTB48 KO comparison quite cleanly. However, for other classes of proteins such as transcription factors that regulate a large number of genes, E3 ligases etc. this might present a more serious concern. Therefore, we extended our loss-of-function comparison to such a transcription factor, MYB, by using the dTAGV-1 degron system. Importantly, the MYB antibody has been used in previous work for ChIP-seq applications (e.g. PMID: 25394790). Here, instead of 186 hits in the MYB vs. IgG comparison using the same MYB antibody in control-treated and dTAGV-1-treated cells (upon 30 min of treatment only) we only detect 9 hits. Again, similar to the WT vs. ZBTB48 KO comparison, 180 proteins are quantified in the DMSO vs. dTAGV-1 comparison, but these proteins fall below the cut-offs (Extended Data Fig. 3g vs. 3i). Again, we believe that this quite drastically illustrates how vulnerable ChIP-MS data is to large numbers of false-positives. This is not only a technical consideration as such datasets are frequently used in downstream pathway/gene set enrichment analyses etc. Such large false discovery rates would obviously lead to error-carry-forward and additional (unintended) misinterpretations. We will carefully expand our textual description across the manuscript to make these points much clearer. In addition, we will move the previous Extended Data Fig. 3 into the main manuscript to more clearly highlight this important point.

      Reviewer #1: Volcano plots in Figs 1, 2, and Suppl. Tables etc: Are the plotted points the mean of 5 replicates? Was each run normalized between the replicates in each group, for e.g. by median normalization of the log2 MS intensities? This does not appear to be the case upon inspection of the Suppl Tables. Given the variability in pulldown efficiency, gel digest and peptide recovery, this would certainly be necessary.

      Response: All volcano plots are indeed based on 4-5 biological replicates (most stringently in the WT vs. KO comparisons in Fig. 2 based on each 5 independent WT and ZBTB48 KO single cell clones). The x-axis of each volcano plot represents the ratio of mean MS1-based intensities between both experimental conditions in log2 scale. However, precisely to account for the variation that the reviewer highlighted we did not base our analysis on raw MS1 intensities but we used the MaxLFQ algorithm (PMID: 24942700) as part of the MaxQuant analysis software (PMID: 19029910) for genuine label-free quantitation across experimental conditions and replicates. In this context, we would also like to refer to a related comment by reviewer #2 based on which we will now addd concordance information for each replicate (heatmaps for Pearson correlations and PCA plots). We will improve this both in the text and methods section accordingly.

      Reviewer #1: Ln 125: The authors make the claim that the ChIP-MS experiments are inherently noisy, with examples from WT cells, dTAG system and IgG controls. This is likely the case, yet their experiments with WT vs KO cells do not identify as many proteins overall. I find this inconsistency somewhat unclear and does not seem to match the claim of ChIP-MS experiments and crosslinking adding to non-specificity. Can the authors add the total number of identified proteins in each volcano plot, for easier reference?

      Response: The number of identified proteins does not vary majorly between matched IgG and loss-of-function comparisons and for instance the single cross-linking (FA only) experiment in Fig. 2c has the largest number of quantified proteins among all ZBTB48 IPs. But we will of course add the requested information to all plots.

      Reviewer #1: I think the manuscript is interest as it provides important benchmarks for ChIP-proteomics experiments. I believe that there are remaining questions that need to be addressed before publication. In particular, I found it difficult to reconcile the discrepancy in protein IDs between most experiments vs. the WT/KO experiment in Fig 2. The authors make a big deal about the importance of the KO control but I think the fewer proteins identified there may be experiment-specific and not general to the KO system. I ask that this be investigated more carefully by the authors in their revisions.

      Response: We would like to thank the reviewer for recognising our work as a source for important benchmarks for ChIP-MS experiments. We hope that with a more detailed description and discussion the highlighted aspects will be more clearly communicated. We originally conceived our manuscript as a short report and now realised that some of the information became too condensed and might therefore benefit from more extensive explanations.

      Reviewer #2

      Reviewer #2: Summary: In this manuscript, Yong and colleagues have introduced a optimized technique for studying actors on chromatin in specific regions with a localized approach thanks to revisited ChIP-mass spectrometry (MS) with label-free quantitative (LFQ). The authors exhibited the utility of their approach by demonstrating its effectiveness at telomeres from cell culture (human U2OS cells) to tissue samples (liver, mouse embryonic stem cells). As a proof of concept, this technique was tested by the authors with proteins from complex shelterin specific to telomeres (TERF2 and ZBTB48), transcription factors (MYB), and through dCas9-driven locus-specific enrichment. Notably, the authors created a U2OS dCas9-GFP clone and then introduced sgRNAs to target either telomeric DNA (sgTELO) or an unrelated control (sgGAL4). The cells expressing sgTELO exhibited a significant localization of telomeres and an enriched amount of telomeric DNA in ChIP with dCas9. They also found the proteins previously identified as known to be enriched at telomeres (for example, the 6 shelterin members).

      Moreover, the authors illustrated the importance of double crosslinking (formaldehyde (FA) and dithiobis(succinimidyl propionate) (DSP) in ChIP-MS. Their data demonstrated also that ChIP-MS is inclined towards false-positives, possibly owing to its inherent cross-linking. However, by utilizing loss-of-function conditions specific to the bait, it can be tightly managed.

      • Can you show the concordance between biological replicates for each ChIP with LFQ? (heatmap of Pearson correlation and PCA plot). This will confirm the robustness of the use of LFQ.

      Response: We will add the requested concordance data for all volcano plots both in the form of heatmaps of Pearson correlation and PCA plots. Across our datasets, the replicates from the same experimental condition clearly cluster with each other and replicates have high concordance values of >0.9. As expected replicates for the target/bait samples have slightly higher concordance values compared to the negative controls (IgG or loss-of-function samples). We thank the reviewer for this suggestion as the new Extended Data panel will strengthen the illustration of our robust LFQ data.

      Reviewer #2: You say that your technique is " a simple, robust ChIP-MS workflow based on comparably low input quantities » (line 139). What would be really interesting for a technical paper would be: a schematic and a table illustrating the differences between your method and the previously published methods (amount of material, timeline,...) to really highlight the novelty in your optimized techniques.

      Response: We will add a comparison table with previous publications using ChIP-MS and for reference include some complementary approaches as requested by reviewer #3. On this note, we would like to stress that we are not “only” intending to use less material and to have an easy-to-adopt protocol. A cornerstone of our manuscript is to apply rigorous expectations to ChIP-MS experiments, in particular the ability to enrich proteins that independently bind to the same chromatin fragments as the bait protein (regardless of whether this is an endogenous protein or a exogenous, targeted bait such as dCas9). Otherwise, such experiments risk to be regular protein IPs under cross-linking conditions, which as illustrated by our loss-of-function comparisons are prone to yield particularly large fractions of false-positives.

      Reviewer #2: It would be interesting to perform the dCas9 ChIP experiment in telomeric regions with and without LFQ. Since the novelty lies in this parameter, at no time does the paper show that LFQ really allows to have as many or more proteins identified but in a simpler way and with less material. A table allowing to compare with and without LFQ would be interesting.

      Response: We do not fully understand what the suggestion “without LFQ” refers to exactly. We assume that this reviewer might suggest to use a different quantitative mass spectrometry approach other than LFQ, e.g. SILAC labelling, TMT labelling etc. Please note that we do not claim that LFQ quantification is per se superior to the various quantification methods that had been developed and widely used across the proteomics community especially before instrument setups and analysis pipelines were stable enough for label-free quantification (a name that is strongly owed to this historic order of development). However, a central goal of our workflow is to make robust and rigorous ChIP-MS accessible to the myriad of laboratories using ChIP-qPCR/-seq and that may not be extensively specialised in mass spectrometry. Both metabolic and isobaric labelling come not only at a higher cost but also present an experimental hurdle to non-specialists compared to performing biological replicates without any labelling, essentially the same way as for any ChIP-qPCR etc. experiment. We will further elaborate on these points in the manuscript to more clearly convey these notions.

      In general, with the right effort different quantitative methods should and will likely yield qualitatively similar results. However, comparisons between LFQ approaches (MaxLFQ, iBAQ,…) and labelling approaches (SILAC, TMT, iTRAQ) have already been better explored and verbalised elsewhere (e.g. PMID: 31814417 & 29535314). Therefore, we believe that this will add relatively little value to our manuscript.

      Reviewer #2: Put a sentence to explain "label free quantification". For a reader who is not at all familiar with this technique, it would be interesting to explain it and to quote the advantages compared to PLEX.

      Response: Thanks for highlighting this. In line with the point above as well as a similar comment by reviewer #1 we will improve this both in the main text and manuscript to clearly explain the terminology, the MaxLFQ algorithm (PMID: 24942700) used and to highlight the advantages compared to labelling approaches.

      Reviewer #2: what does the ranking on the right of each volcano plot represent (figure 1 b-e, figure 2a,d,e for example)? top of the most enriched proteins in the mentioned categories? Not very clear when we look on the volcano plot. it must be specified in the legend.

      Response: The numbering these panels is meant to link protein names to the data points on the volcano plots. The order of hits is ranked based on strongest fold enrichment, i.e. from right to center. We will clarify this in the figure legends.

      Reviewer #2: General assessment/Advance: The authors explain in their article that the ChIP exploiting the sequence specificity of nuclease-dead Cas9 (dCas9) to target specific chromatin loci by directly enriching for dCas9 was already published. Here, the novelty of this study lies in the use of LFQ mass spectrometry to optimize the technique and make it easier to handle. Some comparisons with previous papers or data generated by the lab will be interesting to really show the improvement and the advantage to use LFQ and therefore, to highlight better the novelty of the study.

      Response: We thank the reviewer for this assessment and as mentioned above we will include such a comparison table. dCas9 has been used previously in a ChIP-MS approach termed CAPTURE (PMID: 28841410). While this is clearly a landmark paper that illustrated the dCas9 enrichment concept across multiple omics applications (i.e. not limited to proteomics) in their application to telomeres, the authors enriched only 3 out of the 6 shelterin proteins with quite moderate fold enrichments (POT1: 0.99, TERF2: 2.13, TERF2IP: 1.06; in log2 scale). Based on this alone, POT1 and TERF2IP would not have qualified for our cut-off criteria. In addition, while the authors had performed three replicates, detection is only reported in 1-2 out of 3 replicates. While it is difficult to reconstruct statistical values based on the publicly accessible data, it is therefore unlikely that even these 3 proteins would have robustly be considered hits in our datasets. Similarly, using recombinant dCas9 with a sgRNA targeting telomeres that was in vitro reconstituted with sonicated chromatin extracts from 500 million HeLa cells (CLASP; PMID: 29507191) the authors identified only up to 3 shelterin subunits (TERF2, TERF2IP and TPP1/ACD) based on 1 unique peptide each only. For comparison, in our dCas9 ChIP-MS dataset all 6 shelterin subunits are identified with 9-19 unique peptides, contributing to our robust quantification. Even when considering cell line-specific differences (HeLa cells have shorter telomeres and hence provide less biochemical material for enrichment per cell), these comparisons illustrate that prior attempts struggled to robustly replicate even the most abundant telomeric complex members.

      Based on these findings, others had suggested that dCas9 “might exclude some relevant proteins from telomeres in vivo” (PMID: 32152500), implying that dCas9 ChIP-MS might inherently not be feasible including at repetitive regions such as telomeres. Therefore, we believe that our dCas9 ChIP-MS data is a proof-of-concept that the method has the genuine ability to robustly enrich key proteins at individual loci. In concordance with the comment above we will include a comparison table with previous papers and expand on these points in the discussion.

      Reviewer #2: By presenting this technical paper, the authors allow laboratories across different fields to use this technique to gain insights into protein enrichment in specific chromatin regions such as the promoter of a gene of interest or a particular open region in ATACseq in a easier way and with less materials. This paper holds value in enabling researchers to answer many pertinent questions in various fields.

      Response: We again thank the reviewer for this encouraging assessment and we do indeed hope that this manuscript makes a contribution to a much wider use of ChIP-MS approaches as a promising complement to existing genome-wide epigenetics analyses.

      Reviewer #3

      Reviewer #3: Strengths of the study:

      The study is well-structured and provides a robust workflow for the application of ChIP-MS to investigate chromatin composition in various contexts.

      The use of telomeres as a model locus for testing the developed ChIP-MS approach is appropriate due to its well-characterized protein composition.

      The comparison of WT vs KO lines for ZBTB48 is a rigorous way to control for false-positives, providing more confidence in the results.

      The direct comparison of double vs only FA-crosslinking provides valuable insights into the benefit of additional protein-protein crosslinking in ChIP-MS workflows.

      Response: We thank the reviewer for this assessment and we agree that the above are several of the key features of our manuscript.

      Reviewer #3: Areas for improvement: The novelty of the method is more than questionable as both ChIP-MS coupled to LFQ and dCas9 usage for locus-specific proteomics have been previously reported. The fact that the authors directly pulldown dCas9 instead of using a dCas9-fused biotin ligase and subsequent streptavidin pulldown is only a very minor change to previous methods (not even improvement). It would be more accurate for the authors to present their study as an optimization and rigorous validation of existing techniques rather than a novel approach.

      Response: While we appreciate where the reviewer is coming from, it occurs to us that most of the reviewer’s comments equate ChIP approaches with other complementary methods, in particular proximity labelling. The latter is indeed a powerful experimental strategy and in fact we are ourselves avid users. As highlighted to reviewer #1 as well, our manuscript was originally conceived as a shorter report and based on the feedback we will now expand our discussion to more broadly incorporate related approaches.

      However, we would like to stress that dCas9 ChIP-MS and dCas9-biotin ligase fusions are not the same thing and this is not a minor tweak to an existing protocol. While both approaches have converging aims – to identify proteins that associate with individual genomic loci – the experimental workflows differ fundamentally. Biotin ligases use a “tag and run” approach by promiscuously leaving a biotin tag on encountered proteins. Subsequently, cellular proteins are extracted and in fact proteins can even be denatured prior to enrichment with streptavidin beads. While this is an in vivo workflow that (depending on the biotin ligase used) may provide sensitivity advantages, it does not retain complex information. The latter is inherently part of ChIP workflows due to the use of cross-linkers. One obvious future application would be to maintain (= not to reverse as we have done here) the crosslink during the mass spectrometry sample preparation in order to read out cross-linked peptides to gain insights into interactions and structural features. We will now more clearly incorporate such notions into our discussion.

      In addition, we would like to stress that while this reviewer focuses primarily on the dCas9 aspect of our manuscript, we believe that our general ChIP-MS workflow including the combination with label-free quantitation is useful and important already by itself as e.g. recognised by both reviewers #1 and #2.

      Reviewer #3: The authors should more thoroughly discuss previous works using ChIP-MS and dCas9 for locus-specific proteomics. This would give readers a better understanding of how the current work builds on and improves these earlier methods. For a paper that aims on presenting an optimized ChIP-MS workflow it is crucial to showcase in which use cases it outperforms previously published methods.

      E.g., compare locus-specific dCas9 ChIP-MS to CasID (doi.org/10.1080/19491034.2016.1239000) and C-Berst (doi.org/10.1038/s41592- 018-0006-2); how does your method perform in comparison to these?

      Response: Again, while we will now incorporate more extensively comparisons with previous ChIP-MS publications (and the few prior manuscripts that included dCas9) as well as related techniques, we would like to stress that dCas9 ChIP-MS is not the same approach as CasID and C-BERST, which rely on dCas9 fusions to BirA* and APEX2, respectively. dCas9-APEX2 strategies were also published by two additional groups as CASPEX (back-to-back with the C-BERST manuscript; PMID: 29735997) and CAPLOCUS (PMID: 30805613). All of these methods target specific loci with dCas9 and promiscuously biotinylate proteins that are in proximity to the dCas9-biotin ligase fusion protein. As described above, while the application of the BioID principle (PMID: 22412018) to chromatin regions has converging aims with the dCas9 ChIP-MS part of our manuscript, they do not test the same. ChIP carries chromatin complexes through the entire workflow while the CasID approaches are independent of that. This is the same scenario if we were to compare IP-MS reactions (such as the ChIP-MS reactions presented here for endogenous proteins) and BioID-type experiments for proximity partners of the same bait proteins.

      Reviewer #3: Compare likewise the described protein interactomes to previously published interactomes.

      Response: We will add comparisons in form of Venn diagrams with previously published interactomes. However, we would like to stress that a key aspect of our manuscript is the smaller yet rigorous hit lists based on e.g. loss-of-function controls, higher stringencies and specificity. Simply comparing final interactomes remains reductionist relative to the importance of other variables such as experimental design, number of replicates, data analysis etc.

      Reviewer #3: The authors use sgGAL4 as a control for the telomeric targeting of dCas9. The IF results (Fig3b) show that sgGAL4 barely localizes to the nucleus with very faint signals. It would be helpful to use a control with homogenous nuclear localization of dCas9 to further strengthen the author's conclusions.

      Response: dCas9-EGFP in the presence of sgGAL4 localises diffusely to the nucleus as expected. We have here used a very widely used non-targeting sgRNA control that has been originally used for imaging purposes (PMID: 24360272) and has since been used in a variety of studies (e.g. PMID: 26082495, 32540968, 28427715) including a previous dCas9 ChIP-MS attempt (PMID: 28841410). In addition, to the diffuse nuclear, non-telomeric localisation we provide complementary validation of clean enrichment of telomeric DNA specifically in the sgTELO samples. Therefore, we do not see how other non-targeting sgRNAs would provide for better controls or improve our data.

      Reviewer #3: The extrapolation of results from the use of telomeres as a proof-of-concept to other loci is not a given considering the highly repetitive structure of telomeric DNA. The authors should either be more cautious about generalizing the results to other loci or demonstrate that their method can also capture locus-specific interactomes at non-repetitive regions.

      Response: We agree that the adoption of any locus-specific approach to single genomic loci is a steep additional hurdle and warrants rigorous data on well characterised loci with very clear positive controls. We will expand on these challenges in our discussion. However, we would like to stress that we did not make any such statement in our original manuscript apart from simply referring to our telomeric experiment as proof-of-concept evidence that locus-specific approaches are feasible by ChIP.

      Reviewer #3: What are concrete biological insights from this optimized ChIP-MS workflow that previous methods failed to show?

      Response: We explicitly used telomeres as an extensively studied locus with clear positive controls that at the same time allows us to evaluate likely false positives. As such the intention of the manuscript was not to yield concrete biological insights but to develop a new methodological workflow.

      As also highlighted in a response to reviewer #2, based on other prior attempts to enrich telomers in ChIP-like approaches with dCas9 (PMID: 28841410 & 29507191), it had been suggested that dCas9 “might exclude some relevant proteins from telomeres in vivo” (PMID: 32152500), implying that dCas9 ChIP-MS might inherently not be feasible including at repetitive regions such as telomeres. Therefore, recapitulating the set of well-described telomeric proteins was no trivial feat and our ChIP-MS workflow (both targeted and applied to individual proteins) represents a well-validated method to in the future systematically interrogate changes in chromatin composition. As one example at telomeres, this may include chromatin changes upon the induction of telomeric fusions or general DNA damage.

      Reviewer #3: For instance, the authors could compare their mouse and human TERF2 interactomes and discuss similarities and differences between both species.

      Response: We thank the reviewer for this suggestion, but the comparison between mouse and human TERF2 interactomes is not suitable across the datasets that we generated. U2OS is a human osteosarcoma cell line that relies on the Alternative Lengthening of Telomeres (ALT) pathway while our mouse data is based on embryonic stem cells (mESCs) and mouse liver tissue. Even the latter, in contrast to adult human tissue, expresses telomerase. We can certainly still pinpoint (as already done in our original manuscript) individual differences among known factors, e.g. the fact that proteins such as NR2C2 are more abundantly found at ALT telomeres (PMID: 19135898, 23229897, 25723166) vs. the detection of the CST complex as telomerase terminator (PMID: 22763445) in the mouse samples. However, the TERF2 datasets contain hundreds of proteins as “hits” above our cut-offs and a key message of our manuscript is that the majority of them are likely false positives. Here, differences are likely extending to expression differences between U2OS cells, mESCs and liver samples. So while appealing in theory, this cross data set comparison would remain rather superficial and error prone at this point. As a biology focused follow-up study, this would need to be rigorously conceived based on an appropriate choice of human and murine cell line models. In addition, this would likely require the generation of FKBP12-TERF2 knock-in fusion clones to allow for rapid depletion of TERF2 for a clean loss-of-function control since sustained loss of TERF2 leads to chromosomal fusions and eventually cell death in most cell types.

      Reviewer #3: The authors should also describe which interaction partners are novel and try to validate some of these using orthogonal methods.

      Response: We will now highlight more explicitly two proteins, POGZ and UBTF, that are most robustly and reproducibly enriched on telomeric chromatin across datasets, including the U2OS WT vs. ZBTB48 KO comparison (Fig. 2a). However, we would like to abstain from a molecular characterization at this point. As mentioned above, the discovery of novel telomeric proteins is not the focus of this manuscript, which is primarily dedicated to method development. In addition, these type of validations in methods papers are often limited to a few assays (e.g. can 1 or 2 proteins be enriched by ChIP? Do you see some localisation by IF? etc.). However, our research group has a history of publishing in-depth mechanistic papers on the characterisation of novel telomeric proteins (e.g. PMID: 23685356, 28500257, 20639181, doi.org/10.1101/2022.11.30.518500). Therefore, a genuine validation of such factors would require functional insights and clearly warrants independent follow-up work.

      Reviewer #3: Human Terf2 ChIP-MS (Fig1A) seems to be much more specific than the mouse counterpart (Fig1D) (32 TERF2 interactors out of 176 hits in human vs 12 TERF2 interactors out of 500 hits in mouse). Could the authors explain this notable difference?

      Response: As eluded to above, Fig. 1A and 1D cannot be directly compared, starting with the difference in complexity in the input material – cell line vs. tissue. For comparison, the Terf2 ChIP-MS data from mouse embryonic stem cells tallies up to 19 out of 169 hits, which is much closer to the U2OS results. Again, we deem the majority of hits from the TERF2 ChIP-MS data to be false-positives and the more complex input material from mouse livers likely accounts for the difference in these numbers.

      Reviewer #3: The authors used much higher cell numbers than previously published ChIP-MS experiments; while this is understandable for dCas9-based pulldowns, the cell number is expected to be down-scalable for the other IPs (TERF2, ZBTB48, MYB). Since this work primarily describes an optimized Chip-MS workflow, the authors should show that they can reasonably downscale to at least 15 Mio cells per replicate; one way of achieving this could be through digesting on the beads and not in-gel.

      Response: As we will illustrate in the comparison table that was also requested by reviewer 2, our approach does not use higher cell numbers than previous ChIP-MS approaches – quite the contrary. In addition, we would like to highlight that while we state 50 million cells in Fig. 1a, we only inject 50% of our samples for MS analysis to retain a back-up sample in case of technical issues with the instruments. In other words, our workflow is already effectively based on 25 million cells and thereby pretty close to the requested 15 million cells while simultaneously requiring substantially less reagents.

      Importantly, our examples are based on rather lowly expressed bait proteins such as ZBTB48 (not detected within DDA-based proteomes of ~10,000 proteins in U2OS cells). While the workflow can be applied across proteins, exact input numbers might vary depending on the bait protein, e.g. histones and its modifications would likely require less for the same absolute sample enrichment. For instance, PMID 25990348 and 25755260 performed ChIP-MS on common histone modifications but still used 300-800 million cells per replicate. Considering that we worked on substantially less abundant proteins, we here present a workflow with comparably low input samples.

      Reviewer #3: It is not clear from the text or figure what the authors are trying to show in Fig2c. They should either explain this further or take the figure out.

      Response: We are trying to illustrate the following: As in any IP reaction the bait protein is the most enriched protein with very high relative intensities, e.g. TERF2 in the TERF2 ChIP-MS data. Direct protein interaction partners – here the other shelterin members – follow at about 1 order of magnitude lower signal intensities. In contrast, proteins that are enriched via an interaction with the same DNA molecule (i.e. that do not physically interact with the bait protein) such as NR2C2, HMBOX1 and ZBTB48 further trail by at least 1 more order of magnitude. These are information that are not easily visualised within the volcano plots and mainly “buried” within the Supplementary Tables. However, these relative intensities displayed in Fig. 2c clearly illustrate the dynamic range challenge that ChIP-MS poses for proteins that independently bind to the same chromatin fragment. We have now modified our text to make this point more clear.

      Reviewer #3: Was there any benefit in using a Q Exactive HF vs timsTOF flex?

      Response: Yes, measuring the same samples (e.g. the 50% backup mentioned above) on both instruments enriches more telomeric proteins/shelterin proteins in e.g. the dCas9 ChIP-MS data set on the timsTOF fleX. However, given the difference in age of these instruments/technologies between a Q Exactive HF and a timsTOF fleX (in the context of these experiments the equivalent of a timsTOF Pro 2), this is not a fair comparison beyond concluding that a more recent instrument like the timsTOF fleX achieves better coverage and is more sensitive with otherwise comparable measurement parameters. As we did not have the opportunity to run matched samples on e.g. an Exploris 480, we would not want to make claims across vendors. As stated in the discussion we are expecting that even newer generation of mass spectrometers, such as the very recently released Orbitrap Astral or timsTOF Ultra would further improve the sensitivity and/or allow to reduce the amount of input material. Therefore, the main conclusion is that improvements in the mass spec generations improve proteomics data quality and our samples are no exception, i.e. this is not specifically pertinent to our approach.

      Reviewer #3: How did the authors analyze the PTM data? This is not described in the methods section. In addition, it would be important to validate the novel PTMs described for NR2C2.

      Response: We apologise for the oversight and we will add the description of PTMs as variable modifications during our MaxQuant search in the methods section. The originally deposited datasets already include this and we had simply missed this in our methods text.

      While we are not 100% sure to understand the request for validation correctly, we would like to point out that the PTMs on NR2C2 have been previously reported in several high-throughput datasets and for S19 in functional work on NR2C2 (PMID: 16887930). However, the relevance in our data set is as follows: While the PTMs on TERF2 as the bait protein could occur both on telomere-bound TERF2 as well as on nucleoplasmic TERF2, NR2C2 is only enriched in the TERF2 ChIP-MS reactions due to its direct interaction with telomeric DNA. The co-detection of its modifications therefore implies that at least some of the telomere-bound NR2C2 carries these modifications. We showcase this example as an additional angle of how such ChIP-MS datasets can be analysed.

      While the robust, MS2-based detection of these modified peptides in our data set and several other publicly available datasets provides strong evidence that these modifications are genuine, further functional validation would involve rather labour-intensive experiments and resource generation (e.g. phospho-site specific antibodies). We hope that the reviewer agrees with us that this would require an independent follow-up study and that this goes beyond the scope of our current manuscript.

      Reviewer #3: For this kind of methods paper one would expect to see the shearing results of the ChIP-MS experiments since variations in DNA shearing can impact the detection of false-positives in the ChIP-MS experiments

      Response: We will include agarose gel pictures of our sonicates, which we indeed routinely quality controlled prior to ChIP experiments as stated in our methods description.

      Reviewer #3: Overall, the current state of the manuscript neither provides direct evidence that the "optimized" ChIP-MS workflow is better in certain aspects/use cases than previously published methods nor does it provide novel biological insights. At the current state it even cannot be considered as a validation of previously published methods since it does not discuss them.

      Response: We politely disagree with this conclusion. Again, as mentioned above we are under the impression that this reviewer somehow equates our entire manuscript to a comparison with dCas9-biotin ligase fusions.

      Instead, we here provide a workflow for ChIP-MS that incorporates label-free quantification as the experimentally easiest, most intuitive quantification method for non-mass spectrometry experts. This offers a particularly low barrier to entry aimed at making ChIP-MS more widely accessible as a complement to commonly used ChIP-seq applications. Furthermore, we showcase that as a gold standard ChIP-MS – to truly live up to its name – should have the ability to enrich proteins independently binding to the same chromatin fragment. We demonstrated that double cross-linking is critical for these assays and in return illustrate how rigorous loss-of-function controls (both KOs and degron systems) can mitigate prevalent false-positives that are exacerbated due to the cross-linking. Finally, we applied this workflow to different types of endogenous proteins (transcription factors, telomeric proteins) in cell lines and tissue and extend our work to dCas9 ChIP-MS as a targeted method.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the Reviewers for their detailed and constructive comments. As we describe below, we have now amended the manuscript to address their concerns and suggestions.

      2. Point-by-point description of the revisions

      Reviewer #1

      __In the first paragraph the reviewer states that our study is well presented and convincing, but that it seems “an incremental advance to the previous ones, which properly accounted for PLK4 symmetry breaking and are based on similar assumptions”. __We apologise for not explaining properly why our work is an important advance on these previous studies. Although both previous models can account for some aspects of PLK4 symmetry breaking, they both have significant issues. For example, Takao et al. perform no analysis of the robustness of their model, and from the small number of simulations shown it is clear that some very odd behaviours emerge—e.g. the oscillation of the dominant PLK4 site around the 6 compartments (Figure 3C, Example 3) and the bizarre manner in which PLK4 overexpression drives the formation of multiple PLK4 peaks (Figure 4B, first two examples). The authors do not comment on, analyse, or explain these strange phenomena. This model also relies on STIL being added to the system only after PLK4 has already broken symmetry; this is not plausible in rapidly dividing systems such as the fly embryo where Ana2/STIL levels remain constant through multiple rounds of centriole duplication (Steinacker et al., JCB, 2022). The Leda et al. model predicts that inhibiting PLK4 kinase activity will deplete PLK4 from the centriole, but it is now clear that PLK4 accumulates at centrioles when its kinase activity is inhibited (e.g. Yamamoto and Kitagawa, Nat. Comms., 2019). Moreover, this model supposes no spatial relationship between PLK4-binding compartments; this has important implications for the system’s behaviour (see point 1 in our response to Reviewer #2), and is biologically highly implausible. Thus, neither of the previous models can properly account for several important aspects of PLK4 symmetry breaking.

      Moreover, the two previous studies are not based on similar assumptions. It is only through our analysis that we discover that the underlying biological process driving symmetry breaking in both previous models can be described in the same terms: with short-range activation and long-range inhibition causing diffusion-driven instability. This crucial conclusion was not obvious from, nor claimed by, either of the previous publications. We believe this is an important step in model development for these systems.

      __The reviewer raises a number of minor concerns, the first of which is a previous study from Chau et al. (Cell, 2012), which studies how two component systems break symmetry. Differential diffusion is not essential for symmetry breaking in some of the models considered by Chau et al., and so they wonder if it is really essential in our system. __We thank the reviewer for pointing us to this study. It can be proven mathematically that differential diffusion is essential for symmetry breaking in the Turing-type framework. In the systems studied by Chau et al., symmetry can be broken without differential diffusion if one of the two components can be depleted from the cytoplasm. Such cytoplasmic depletion does not occur in traditional Turing-type systems, and it is almost certainly not occurring during PLK4 symmetry breaking—e.g. FRAP experiments show that PLK4 continuously turns over at centrioles (Cizmecioglu et al., JCB, 2010; Yamamoto and Kitagawa, Nat. Comms., 2019). We discuss this point (p8, para.3).

      __The reviewer states that it is unclear which term in equations (3-4) and (5-6) correspond to the self-activation and activation/inhibition of the other component that are indicated in the schematic summary of the models shown in Figure 1C. __As we now clarify, in general it is not always possible to pinpoint a single term in an equation that corresponds to activation/inhibition. Mathematically, a positive feedback for means that , and a negative feedback for means that . Hence, activation and inhibition can change depending on the values of these derivatives during the dynamics as these inequalities may be achieved with complex expressions that extend beyond the usual proportional relationships. We have amended the manuscript to make this clearer (p10, para.2).

      The reviewer pointed out an error in the arrows in Figure 2 (we believe this is actually Figure 4). We thank the reviewer for pointing this out and have now corrected this mistake.

      Reviewer #2

      Major Comments:

      __ 1. The reviewer points out that in all models of PLK4 symmetry breaking the overexpression of PLK4 should be able to generate multiple PLK4 peaks (as, experimentally, PLK4 overexpression can generate up to 6 procentrioles around the mother centriole). The Reviewer suggests that the two previous models can do this, but we only show examples where PLK4 overexpression generates two peaks, and the reviewer questions whether this is a general limitation that would invalidate our approach. __We are grateful to the reviewer for pointing this out, and we now expand our analysis and discussion of this important issue (p13-15). It is indeed possible to produce more peaks in our model using different parameters—e.g. decreasing diffusivity leads to thinner peaks, allowing more peaks to form (Figure 3B, Figure 5B). Importantly, however, when diffusion is decreased, the region of the parameter space in which only a single peak will form inevitably becomes smaller—as diffusion can no longer efficiently suppress the formation of additional peaks around the rest of the centriole surface. Hence, in both our original models we struggled to find a parameter regime in which PLK4 robustly formed a single peak, but also formed >3 peaks when PLK4 was overexpressed. As we now discuss in detail, we believe that this is a general problem, as any model of PLK4 symmetry breaking must involve information being communicated around the centriole surface. We now show that a possible solution to this problem is to postulate that increasing PLK4 levels leads to a decrease in PLK4 diffusivity (Figure 3C, Figure 5C)—a biologically plausible possibility (p15, para.2).

      In addition, it is not correct to say that the previous formulations of these models do not have this problem (or, in the case of Leda et al., the model actually has a related problem). This problem must apply to the Takao et al. model, as it also relies on information travelling around the centriole surface. This problem is far from obvious, however, because Takao et al. do not analyse the robustness of their model. This problem does not apply to the Leda et al. model, but this is because their model supposes no spatial relationship between the individual compartments and instead assumes that communication between compartments is instantaneous. This allows their system to overcome this communication problem and so robustly form a single peak at low PLK4 concentrations, while forming multiple peaks at high concentrations (as shown in Figure 6B). However, this requires that diffusion is sufficiently fast that concentration gradients are negligible between centriolar compartments, but not so fast that the relevant species are diluted in the much larger cytoplasm. It seems implausible that both of these effects may be achieved with a single diffusion rate in the real-world physical system.

      __ 2. The reviewer points out that in our modelling any multiple PLK4 peaks formed will tend to be evenly spaced around the centriole surface whereas, in their original formulations, the two previous models predict that any multiple ‘winning’ PLK4 compartments will not have any preferential spatial location with respect to each other. They ask that we address this difference and justify why we think our prediction is a better representation of PLK4 symmetry breaking. __Although it is not obvious, neither of the previous models makes clear predictions about the spacing of multiple PLK4 peaks. As described above, Leda et al. assume no spatial relationship between PLK4-binding compartments, so relative peak-spacing cannot be assessed. Moreover, from the limited analysis shown, it is not clear that Takao et al. predict random spacing. The authors show only two simulations of PLK4 overexpression (Figure 4B, first two simulations) and the behaviour of PLK4 is very odd: the initial noise in the system fades away before PLK4 levels rapidly and near-simultaneously rise at multiple, reasonably well-spaced, peaks, before fading away to low levels—even after STIL addition. At the end of the simulation the “winning” compartments contain very low levels of PLK4 (often lower than the noise initially introduced into the system), but these compartments are reasonably (simulation 1) or very (simulation 2) evenly spaced.

      Nevertheless, the reviewer is correct that the even spacing of multiple peaks is a feature of our model. Unfortunately, it is not possible to compare this prediction to reality because the spacing of multiple PLK4 peaks in cells overexpressing PLK4 has not been quantified yet. Thus, one has to interpret published images, some of which support equal spacing while others do not (e.g. Kleylein-Sohn et al, Dev. Cell, 2007). Moreover, this analysis is likely to be complicated because CEP152 can form incomplete rings. This can be appreciated in Figure 2C in Hatch et al., (JCB, 2010) where the extra centrioles induced by PLK4 overexpression do not appear to be evenly spaced around the centriole, but are quite evenly spaced around the partial CEP152 ring. Therefore, equal spacing of peaks in ideal conditions is a feature predicted by our model that still needs to be fully explored experimentally. We believe that part of the power and value of our model is to suggest such hypotheses. We now discuss this important point (p26, para.2).

      __ 3. The reviewer questions our attempt to discretise our continuum model (where we convert the continuous centriole surface to a series of discrete compartments on the centriole surface and show that symmetry breaking can still occur). They note that we only show one example (9 compartments), they ask for more information about how the discretisation was done, and they question the independence of the compartments as PLK4 appears to accumulate in compartments adjacent to the dominant compartment. __We apologise for the lack of clarity here. We now state that our models can break symmetry provided that there are at least two compartments, and we now include simulations showing that this happens for 2 – 10 compartments (Figure S2). The discrete model is a finite-difference discretisation of the continuum model (described in Appendix V). We also now clarify that the compartments are ‘independent’ in the sense that all chemical reactions only occur between components that are within the same compartment. The compartments are still spatially linked via a discretized diffusion (as would likely be the case at the centriole), which explains the observed relationship between neighbouring compartments.

      __ 4. The reviewer asks whether all the parameter values that satisfy the mathematical constraints we calculate for our models will break symmetry. If so, they suggest we are using a circular argument when demonstrating that the models break symmetry as we use parameter values chosen specifically to satisfy these constraints. __In Turing-systems, one can mathematically calculate parameter constraints that allow symmetry breaking. As we now clarify, all parameters that satisfy these constraints can break symmetry, while any parameters outside these constraints cannot break symmetry. Thus, it was never our intention to claim something new or surprising when we illustrated the symmetry-breaking properties of our models (Figures 2 and 4, and associated parameter space analysis in Figures 3 and 5), so we apologise that our intention on this point was unclear. Rather, these Figures illustrate the detailed behaviour of each system under different conditions—something that is not possible to intuit from the equations alone.

      5. The Reviewer requests more information about how we chose the particular parameter values we use to illustrate each model and asks that we convince readers that other sets of values that satisfy the derived mathematical requirements would result in the same qualitative outcomes. As described in point 4 above, and as we now state more clearly, it is a mathematical fact that parameter values that satisfy the derived mathematical requirements can break symmetry. We now discuss our reasons for choosing specific parameters in more detail (see point 6, below).

      __ 6. The Reviewer asks whether the dimensionless parameters we use in our models have any biological relevance, and requests a biological interpretation of all of them. They also request that we relate the Diffusivity ratios of the Activator and Inhibitor species (____) to the experimental observations made by Yamamoto and Kitagawa. __Relating our dimensionless parameters to biologically-relevant dimensional parameters is a complex issue. For example, one can see from equations (5) and (6) that simultaneously doubling (A), (I), and (a), and decreasing (b) by a factor of 4 leaves the system unchanged. Since the concentrations of A and I are unknown at the centriole surface, this means that it is not possible to determine the dimensional values of the rate of production of I (a) and its rate of conversion to A (b). This limitation is the root of the mathematical fact that FRAP experiments can reveal “off” rates but not “on” rates. Moreover, to convert the rate of loss of A (c) and I (d) into dimensional parameters it is necessary to know the timescale of symmetry-breaking. This is unknown, but was assumed to be on the order of hours in the previous models. This corresponds to a degradation/loss rate of minutes with our current choice of parameters, which is consistent with FRAP data (e.g. Yamamoto and Kitagawa, Nat. Comms., 2019). Regarding the ratio, the effective diffusion in our model depends on both the bulk diffusion and the binding/unbinding/degradation rates – a complexity also noted by Yamamoto and Kitagawa. This makes it very difficult to relate the “effective” surface diffusivity to the bulk diffusivity. We are currently investigating the form of this dependency, but this is a complex mathematical problem that is beyond the scope of this manuscript. These issues are difficult to discuss succinctly, so we now simply state that we chose specific parameter values based, in part, on the values and ratios used in the previous modelling papers (p10, para.2; p17, para.2).

      Unfortunately, we could not find any experimental measurements of diffusivity in the Yamamoto and Kitagawa paper, as the Reviewer suggests. We now clarify, however, that the ratio we use in both models (2500) is chosen to be between the effective diffusivity ratio (as the previous models used binding/unbinding rates rather than diffusivity) used by Takao et al. (10000) and Leda at al. (200). We also include a phase diagram showing how varying the diffusivity of both factors influences symmetry breaking in both models (Figure 3B, Figure 5B), and we state that we have chosen all remaining parameter values to reflect the parameter values in the original models, when adjusted to the same timescale.

      __ 7. The Reviewer asks for more information about how we normalised time in our simulations and whether the time in different simulations is comparable. __We now clarify that the simulations run for a single unit of dimensionless time (so they can be compared), and that the reaction/diffusion parameters in the system are sufficiently large by comparison with unity that all simulations achieve steady state within a unit of time (p11, para.2).

      8. The Reviewer asks whether concentrations of _and can be compared between simulations, and also questions our description of _ being uniformly accumulated in Figure 4D, rather than uniformly depleted. __We clarify that concentrations can be compared within a model, but not between models. This is because the dimensional values depend on the dimensional reaction rates, which differ between the models. This is not just a theoretical limitation; experimental fluorescence signals are typically compared in relative arbitrary units so the absolute values of different systems cannot be easily compared for the same reason. We agree with the reviewer that it is better to describe Figure 4D as showing uniform depletion of the activator, and we have adjusted the legend accordingly.

      The reviewer makes a number of minor points that are not numbered.

      __The reviewer asks for clarification of what we mean by “robustness”: does this refer to the ability to produce the same result in multiple simulations, or to the ability to produce the same result when parameter values are varied? If the latter, then the reviewer suggests our models are not very robust. __We apologise for this confusion and now more clearly define what we mean by robust (p13, para.2). As we discuss in point 1 of our response to this Reviewer, our initial models are indeed not very robust at producing a single PLK4 peak over a range of PLK4 concentrations. We now discuss why this lack of robustness is likely to be intrinsic to any PLK4 symmetry breaking system, and how robustness in all such models can be improved by allowing diffusivity to vary with PLK4 expression levels (p13-p15).

      __The Reviewer points out that the original models introduce a noise term at every iteration, whereas we only introduce an initial noise term; they ask us to discuss this difference. __We have run simulations introducing a noise term at every iteration and find that this makes negligible difference (Reviewer Figure 1, attached to the end of this letter). We do not take this approach, however, as this would significantly complicate the mathematical analysis that we perform (the additional noise term turns the system of PDEs into a system of SDEs which do not fit the Turing framework as readily). We now mention this in Appendix V.

      The Reviewer states that the reaction schemes are unnecessarily repeated in Figures 1, 2 and 4. We would like to keep these schematics, as in Figure 1 we show a generic scheme (illustrating the two possible Turing-type reaction diffusion systems) whereas in Figures 2 and 4 we show specific reaction regimes (specifying the relevant species) that we test in each model. We feel this information will be useful to readers in this visual format.

      The Reviewer states that it is confusing that we refer to the specific reaction parameters (k11 and k12) that need to be swapped to convert the Leda et al. model to the Takao et al. model, as this information will not mean anything to readers who are not familiar with the models. We agree and have now removed this information.

      The Reviewer suggests several textual amendments and/or corrections. We thank the reviewer for spotting these and have amended them all accordingly.

      __Finally, the Reviewer states in their significance summary that although our key conclusions are convincing, they are not new as Takao et al. describe their model as analogous to a “reaction-diffusion system (also known as a Turing model)”. __We were aware that Takao et al. make this statement, but this does not invalidate the novelty or significance of our work. This is because although Takao et al. described their model as being analogous to a “Turing model”, it is not actually a reaction-diffusion system, and it does not exhibit the property of long-range inhibition that is central to all Turing-systems to produce a single PLK4 peak. Instead, they use lateral inhibition (in which the influence of the inhibiting species does not extend beyond the neighbouring compartments) to reduce the number of potential PLK4 binding sites from ~12 to ~6. A single winning site is subsequently selected when STIL is added to the system—with additional positive feedback (not involving reaction-diffusion) ensuring that the compartment with most PLK4 becomes the dominant site. Their analysis of the reaction-diffusion version of their system is limited to a single supplementary figure (Figure S2D), and they do not perform or refer to any of the relevant mathematical analyses of their model that makes these well-studied systems such powerful tools. We believe that the model presented here is simple enough to draw the attention of the applied mathematics community while robust and complete enough to provide a mechanistic explanation of many interesting features and suggest new possible phenomena. We now discuss these points (p22, para.1).

      Reviewer #3

      __The Reviewer found our manuscript well-written, and judged it of interest to centriole duplication enthusiasts. __We interpret this to mean that the Reviewer did not think it of more general interest. This seems a harsh assessment, as the precise one-for-one duplication of centrioles is generally considered to be one of the great mysteries of cell biology. It is now widely appreciated that robustly breaking PLK4 symmetry to form a single PLK4 peak is crucial to this process. Thus, our discovery that this process can be described using a well-studied mathematical framework that has already been applied to a vast range of biological processes is potentially of significance even to non-centriole enthusiasts.

      The Reviewer made a number of specific comments:

      Figure 1. The Reviewer felt the graphic in Figure 1A could be improved by combining it with Figure 1B, and noted that the centrioles look strange. We thank the reviewer for these suggestions and we have now rearranged this Figure. We also now clarify that the schematic depicts Drosophila centrioles, which are simpler than human centrioles.

      __Figure 2. The Reviewer suggests that to make the system depicted in Figure 2A fit as a Type I Turing system we have to assume that (I) must dissociate from the centriole or be degraded at higher rates than (I) converts (A) to (I). They suggest this assumption is implicit in the model and they request further explanation. __The reviewer is correct that, in Model 1, the degradation/dissociation of () is the root of its self-inhibition. However, we do not need to make any assumption about the relationship between the rate at which converts to (b), and the dissociation/degradation rate of (d) for this system to work (as the Reviewer implies). This is because, whatever these rates are, the system will approach a steady-state where the production and degradation terms balance, and it is the stability/instability of this state that determines whether the system can break symmetry. Since the degradation rate of (the - term in equation 4) increases more rapidly than its production rate (the term in equation 4) as increases, this results in a stable (i.e. self-inhibiting) system regardless of the parameter values. We have rewritten the sections explaining these equations to try to make these points more clearly and to point readers to Appendix II where we explain the form of the equations.

      __The Reviewer asks if in Model 1 it is realistic to assume no turnover or loss of PLK4 (A), and will the system still work if this is altered? __This is a good point. In Model 1, we set c=0 as this makes the analysis significantly simpler, enabling us to display the mathematical predictions alongside the numerical simulation. We have now added the (c,d) phase diagram to show the effect of varying these parameters on the symmetry breaking properties of the system (Figure 3D). We find that the value of c has a relatively weak effect on the symmetry breaking properties of the model since it does not affect the function of as an activator.

      __The Reviewer asks if our 1D model would work in 2D, and notes the PLK4 peaks in our models are broad, likely limiting the number of peaks formed. They also note that in our Model 1 it is the unphosphorylated form of PLK4 that accumulates in the peak, which seems unlikely as it is widely believed that PLK4 must be active to phosphorylate STIL to promote its interactions with SAS6 and CPAP. __From a mathematical perspective, modelling our system in 2D would produce very similar results. Symmetry breaking is driven by long-range inhibition/short-range activation, and these behaviours will work analogously in 2D. As discussed in our response to Reviewer #2 (point 1), the broad peaks do indeed limit the number of centrioles that can form, but by altering the parameters we can generate more peaks that are less broad (Figures 3 and 5). The Reviewer is correct that Model 1 (based on Takao et al.) predicts that non-phosphorylated PLK4 () accumulates in the peak. This is also true of the original Takao et al. model, although this was not highlighted or commented on by the authors. We now expand our discussion of this point (p25-p26).

      The Reviewer asks if our models can form multiple peaks at higher PLK4 levels. This is again related to Reviewer #2, point 1, and we now show that this is indeed possible under the appropriate parameter regime (Figure 3C and Figure 5C).

      The Reviewer asks for more description of how lateral diffusion works in our system. For example, do we consider that not every molecule of (I) will diffuse laterally (as some will be lost to the cytoplasm), or that the probability of a molecule leaving the surface will increase as distance/time increases. We apologise for our lack of clarity. We now state that the proportion of molecules not rebinding to the surface is accounted for in the reaction components of all our models (p7, para.1). In reality, and as we now state, the relationship between this loss and the diffusion rates (and their relation to distance/time, for example) is complicated. We are investigating this relationship in more detail, but this is beyond the scope of the current paper.

      The Reviewer asks if symmetry breaking might eventually occur if the system in which we reduce the kinase activity of PLK4 (Figure 2D) were given more time. They also ask whether reducing PLK4 levels by half would lead to a failure in site-selection. The kinase inhibited scenario we show here will not break symmetry over any period of time; this can be proven mathematically, and is verified in the numerical simulations (Figure 3A and 5A, bottom left regions of graphs), which we now state more clearly are always run for a long enough period to reach a steady-state (p11, para.2). The effect of reducing PLK4 levels in our models is analysed in the phase diagrams shown in Figure 3 and 5 (and analysed in more detail in Figure S1), where it can be seen that there are multiple PLK4 concentrations that can be halved without a failure in site selection (although, see also our response to Reviewer #2, point 1).

      The Reviewer pointed out some errors in our presentation of Figure 3, (and suggested some improvements in presentation in a point further below) and also asked for more information about the parameters used to generate the data in Figures 2B-D and 4B-D. We thank the Reviewer for these suggestions and have made these changes and provided the additional information requested (e.g. marking the specific parameters used in our simulations on the phase diagrams shown in Figure 3 and Figure 5 with coloured dots).

      The Reviewer points out that when PLK4 levels and activity are both high no centrioles are produced in Model 2, whereas 1 centriole is produced in Model 1—neither of which are consistent with experimental observation. We now show an expanded parameter space (new Figures 3A and 5A) where it can be seen that this is not a problem for Model 1. For Model 2, the region of high kinase levels and activity (dark blue, top right, Figure 5A) corresponds to the uniform accumulation of the activator species. Thus, while there are no peaks, this region might produce multiple centrioles, as it is equivalent to a compartmental model in which all of the compartments are occupied. We now discuss this point (p19, para.1).

      __The Reviewer questions how the biology fits a Type II Turing system, pointing out that current data suggests that active PLK4 turns over more rapidly at centrioles, whereas in the Type II model we describe (based on the Leda et al. model) it is the phosphorylation state of STIL that determines which species of PLK4:STIL turns over rapidly. They also question the logic of the Model 2 Type II circuit (Figure 3A), questioning how A could drive the dephosphorylation of STIL to promote the production of I. __We agree that current data is more consistent with phosphorylated species of PLK4 turning-over more rapidly at centrioles, but this is not what Leda et al. proposed, and so this is not what we implemented in trying to reformulate their model (although this is effectively the change we make that turns the Leda et al. model into the Takao et al. model). As to the second point, the Reviewer has correctly spotted a problem with our model that arises because the direction of the arrows linking and were inadvertently flipped in Figure 4A. This mistake has been corrected, and we now explain more clearly how the biology of this system fits a Type II Turing system in the legend.

      __The Reviewer points out that although we can convert the Leda et al. Model (Model 2) to the Takao et al. Model (Model 1) simply by changing the identity of the _ and _ species, the underlying assumption of the Takao et al. model (that non-phosphorylated PLK4 promotes its own accumulation) was not an inherent assumption of the Leda et al. model. __We apologise for this confusion. As we now clarify (p20, para.1) the Reviewer is correct that when we make mathematical changes to the Leda et al. model we must also assume changes in the underlying biology—so that non-phosphorylated species of PLK4 are now slow diffusing, rather than non-phosphorylated species of STIL, as originally proposed). As the Reviewer points out, current data suggests that non-phosphorylated species of PLK4 do turnover more slowly, although it is not clear why—for example, liquid-liquid phase separation driving the formation of PLK4 condensates has been postulated, but is far from proven. This remains an interesting problem that will be further probed mathematically and experimentally.

    1. Author Response:

      The following is the authors' response to the original reviews.

      We thank both reviewers for their comments, which have suggested changes that have improved the manuscript.

      Reviewer #1 (Public Review): 

      […] A weakness in the methodology is the link to tissue tension and conclusions about tissue mechanics. Methods that directly affect tissue tension and a more thorough and systematic application of laser ablation experiments would be needed to profoundly investigate mechanosensation and consequential effects on tissue tension by the various genetic perturbations.

      Response: In revision, we have added some additional experiments that examine altered tension.

      While the in-silico analysis of competing for F-actin binding sites for βH-Spec and myosin appears logical and supports the authors' claims, no point mutation or truncations were used to test these results in vivo.

      In its current structure the manuscript's strength, the genetic perturbations, is compromised by missing clear assessments of knockdown efficiencies early in the manuscript and other controls such as the actual effect on myosin by ROCK overactivation. 

      Response: In revision, we reorganized the manuscript and figures to document the knockdown efficiency earlier in the manuscript, and have added additional figure panels illustrating the effects of altered tension on myosin levels.

      Reviewer #2 (Public Review):

      […] The authors suggest that Ajuba is required for the effect of beta-heavy spectrin. However, it is still formally possible that this could be a parallel pathway that is being masked by the strong phenotype of Ajuba RNAi flies. 

      Response: While it is formally true that the genetic requirement for Jub could reflect a role in parallel to, rather than downstream of, spectrins, our conclusion that spectrins act through Jub is based not only on the genetic requirement for Jub, but also on the influence of spectrins on junctional tension and Jub localization, which indicate that spectrins influence Jub activity in a manner consistent with their affecting the Hippo pathway through Jub.

      One of the major points of the manuscript is the observation that alpha- and beta-heavy-spectrin are potentially working independently and not as part of a spectrin tetramer. This is mostly dependent on the observation that alpha- and beta-heavy-spectrin appear to have non-overlapping localizations at the membrane and the fact that alpha- and beta-heavy-spectrin localize at the membrane seemingly independently. It is not entirely obvious that a potential lack of colocalization and the fact that protein localization at the membrane is not affected when the other partner is absent is sufficient to argue that alpha- and beta-heavy-spectrin do not form a complex. Moreover, it is possible that the spectrin complexes are only formed in specific conditions (e.g. by modulating tissue tension). 

      Response: Our results argue that alpha- and beta-heavy-spectrin do not form a detectable complex in the wing disc under the conditions examined, and thus that they act independently is this context. However, we agree that it is possible that they could function together contexts, eg in other tissues or under different conditions, and we have revised the text in the Discussion to note this.

      If indeed spectrins function independently, would it not be expected to see additive effects when both spectrins are depleted? 

      Response: Not necessarily, since both alpha- and beta-heavy-spectrin act through Jub, and there may be a limit as to how much Yki activity can be increased by Jub (eg the increases in wing size induced by spectrin RNAi are similar to the increases in wing size observed with constitutive recruitment of Jub through alpha-catenin mutation (Alegot et al 2019).

      Related to the two previous points, the fact that the authors suggest that both alpha- and beta-heavy-spectrin regulate Hippo signaling via Ajuba would be consistent with the necessity of an alpha- and beta-heavy-spectrin complex being formed. How would the authors explain that both spectrins require Ajuba function but work independently? 

      Response: The different spectrins both affect Jub because they both affect cytoskeletal tension, but our results suggest that they act in different ways to affect tension. We have made some revisions to the Discussion section to try to make this clearer.

      Another major point of the manuscript is the potential competition between beta-heavy-spectrin and myosin for F-actin binding. The authors suggest that there is a mutual antagonism between the two proteins regarding apical F-actin. However, this has not been formally assessed. Moreover, despite the arguments put forward in the discussion, it seems hard to justify a competition for F-actin when beta-heavy-spectrin seems to be unable to compete with myosin. Myosin can displace beta-heavy-spectrin from F-actin but the reciprocal effect seems unlikely given the in vitro data. 

      Response: We show in vivo, in vitro, and in silico data that are all consistent with the inference that beta-heavy-spectrin and myosin compete for binding to F-actin. As the reviewer notes, and as we discuss, the in vitro competition experiments were limited because, for technical reason, we were unable to increase the protein concentrations higher. We also note that our in vitro experiments used an active form of myosin, which binds F-actin much more strongly than inactive myosin.

      Reviewer #1 (Recommendations For The Authors): <br /> While the flow of experiments is logical in general, I see major problems regarding the structure of the manuscript and essential controls: 

      • It is very confusing to have samples (kst-CRISPRa) in figures 1-3 that were not introduced in the text until the second-last paragraph of the results. I would suggest introducing this elegant overexpression experiment early in the manuscript as it fits well in the scope of these experiments or alternatively (if the authors prefer) make a new figure containing all the data regarding the overexpression in the end. 

      Response: We have now moved these results to a new figure (new Fig 7) that is described later in the text.

      • At the beginning of the manuscript, essential controls regarding the knockdown efficiency are missing in the main figure. Many of the key experiments are based on KD and as a reader, I want to assess their efficiency. Only in Figure 4, at the end of the manuscript, KST and α-Spec KD efficiency is revealed - this should be shown earlier and quantified properly. While reading the manuscript in its current form, the doubt remains that differences e.g. in α-Spec and KST KD can be explained by varying knockdown efficiencies as their levels can't be assessed. 

      Response: We have now moved these results to a new supplemental figure (Fig 1-supplement 1) that is cited earlier in the text.

      • On a similar line, in Figure 5 where myosin activity is perturbed, induction or repression of myosin activity is only suggested but not formally shown. The authors have to demonstrate that this is indeed the case by showing the myosin signal, ideally accompanied by measurement of tissue tension. 

      Response: This was not included because we and others have assessed these manipulations in earlier publications. However, as requested we have now added a supplemental figure (Fig 6 supplement 1) showing myosin levels in these genotypes.

      • On p. 7, the authors claim that "The epistasis of jub to kst suggests that βH-Spec regulates wing size through its tension-dependent regulation of Jub." While the authors show that KST KD increases myosin and junctional Jub, and that the wing overgrowth phenotype of KST KD depends on Jub, the tension-dependency was not demonstrated. To make that claim, the tension profile should be perturbed e.g. by overexpression of rok, myosin mutants (as the authors do in Fig 5) and the effect on Jub should be analyzed. Induction of tension in these conditions should be measured by laser ablation or a suitable alternative method. It might well be that the induction of Jub in KST KD is not via tension but an alternative mechanism such as the release of steric hindrance, interaction competition, etc. Also: Does KD of Jub affect spectrin localization? 

      Response: The effect of tension on Jub, and the effects of the myosin activity changes we employed on tension, have been analyzed in prior publications (eg Rauskolb et al 2014). To further address the issue raised by the reviewer here as to whether Kst affects Jub and wing growth via tension, we have also now added an additional experiment (Fig 3 supplement 1) in which we decreased tension in a βH-Spec RNAi wing disc by simultaneously expressing RNAi targeting Rok. The results show that the wing growth and Jub accumulation associated with βH-Spec RNAi are suppressed by Rok RNAi, consistent with our conclusion that these effects are mediated via cytoskeletal tension.

      As KD of Jub alters the pattern of myosin accumulation in wing discs (Rauskolb et al 2019) it could be expected to have a complementary influence on βH-Spec localization, but we have not examined this.

      • The authors make a very strong point in saying "The influence of βH-Spec on junctional tension is thus a direct consequence of its competition with myosin for overlapping binding sites on F-actin." While the authors provide some in vitro and in silico evidence, it was for example not possible to outcompete myosin by increasing levels of KST CH1-CH2 domains in vitro (for possible reasons the authors discuss). More importantly, the hypothesis that competition for actin binding is the definite cause of the antagonizing effect was not tested in vivo. Overexpression of a mutant version of KST that is unable to bind F-actin, or that has an increased affinity (etc) for actin was not tested. Such an experiment would be very valuable to enrich this manuscript but at least, claims like that have to be less bold and need to be written in a more speculative language. 

      Response: We consider creating and analyzing mutant forms of Kst in vivo to be beyond the scope of this manuscript, but as suggested we have now modified the text highlighted by the Reviewer to be more cautious.

      Further points: 

      • Why does the thickness of the wing disc epithelium change due to KST and α Spec KD, the authors should introduce this experiment better and draw a proper conclusion. Is there any relocalization of myosin along the apical-basal axis? Can the authors speculate about the differences between KST and α Spec KD? 

      Response: The epithelium thickness changes with α-Spec KD, but does not change with Kst KD. We think the explanation is provided by work from the Pan lab (done mainly in pupal eyes), which reported decreased cortical tension and increased apical area when α-Spec is lost. The interpretation in essence is that with the loss of attachment of F-actin to membranes along the lateral sides of the cells, the sides of the cells are "softer" and the cells expand laterally and thus also (by conservation of volume) shorten apical-basally. This is somewhat speculative, and it's not a focus of our study, but we have added some text to try to explain this better. Myosin along apical-basal axis was not visibly altered, but it is harder to analyze as it is very weak compared to junctional myosin.

      • Given the authors' observation of differences in the relative localization of KST and α Spec (Figure 4), proper quantification of KST, α Spec and myosin levels along the apical-basal cell axis would be important. This would also ease data interpretation. 

      Response: We have now added a higher resolution image and also a line scan of Kst, α-Spec  and Myo in a new supplemental figure (Fig 6 supplement 1)

      • KD of α Spec seems to induce myosin activity more, causes a bigger reduction of wing thickness, a stronger induction of Jub, and a similar effect on wing size. What lead the authors to focus on KST rather than α Spec regarding the detailed analysis of myosin competition? 

      Response: Our observations identify a competition between Kst and myosin, but we have no indication that α-Spec competes with myosin. (It's conceivable that β-Spec might also compete with myosin in some contexts, but wing discs would not be a good place to examine this because the localization profiles of β-Spec and Myosin are so different).

      • A big criticism regarding the figures is the bad color choice which makes it difficult to decipher the fluorescent signals. Likewise, the labels are difficult to read with the present coloring. They should really be changed. 

      Response: We have now changed the single color images to gray scale (for multi-color images we retain RGB coloring).

      A minor point: 

      • To make the manuscript more accessible for researchers outside the Drosophila field, I'd suggest adding explanatory labels for Drosophila-specific terms such as hyperactive myosin for sqhEE, a scheme to show where UAS-dcr2 is active, explain the purpose of Rfp expression as a control for tissue specificity, etc. 

      Response: We have added some explanations to the text to try to make this clearer.

      Reviewer #2 (Recommendations For The Authors): <br /> Major points: 

      In lines 99-101, the authors mention that Deng et al., 2015 report that the depletion of spectrins leads to an increase in pMLC, with no associated changes in the colocalization of myosin and F-actin. It is more accurate to mention that Deng et al. suggest that the levels of a GFP-tagged rescue construct of MLC (Sqh) are unchanged in alpha-spectrin mutants, although this was not formally quantified. Moreover, there was not a formal assessment of colocalization between MLC and F-actin, but rather a suggestion that F-actin levels are unaffected by the alpha-spectrin mutation. Finally, Deng et al. mostly analyzed alpha-spectrin so it remains possible that the new results shown by the authors are compatible with the initial observations from Deng and colleagues. 

      Response: As suggested, we revised the text to note that Deng et al., 2015 specifically examined Sqh:GFP. While we agree that our focus is more on Kst and Deng et al focused on α-Spec, we also examined α-Spec, and as described our results examining Myosin and Jub differ from what was reported by Deng et al 2015.

      As mentioned above, it is still possible that spectrins and Ajuba are working in parallel and Ajuba is not necessarily downstream of spectrins. The strong phenotype of Ajuba RNAi flies in adult wings could mask the effect of spectrins. Are the results similar in other settings, such as in the absence of Dicer2? Also, can Ajuba RNAi phenotypes be modified by overexpression of spectrins? This would provide further evidence of a link to Ajuba function. 

      Response: While formally it is true that the genetic requirement for Jub could reflect a role in parallel to, rather than downstream of, spectrins, our conclusion that spectrins act through Jub is based not only on the genetic requirement for Jub, but also on the influence of spectrins on junctional tension and Jub localization, which indicate that spectrins influence Jub activity in a manner consistent with their affecting the Hippo pathway through Jub.

      We would not expect over-expression of spectrins in a jub RNAi background to further reduce Hippo signaling, and as the jub RNAi phenotype is much stronger than the Kst over-expression phenotype even if there were an effect it would likely be difficult to detect.

      Regarding the potential independent functions of spectrins, it would be interesting to determine if alpha- and beta-heavy-spectrin can still interact at the level of the AJ despite the fact that their distributions appear to be partly non-overlapping. Would it be possible to assess this using PLA? If an interaction is not detected via PLA, it would be more convincing that spectrins are functioning independently. 

      Response: We have now performed this experiment, and no significant signal was detected by PLA. As a control, we used identical antibodies (GFP and α-Spec) to conduct PLA on α-Spec and β-Spec, and we did detect signal by PLA. These results (included in a revised Figure 4) further support the conclusion that α-Spec and βH-Spec are not physically associated in wing discs.

      Related to this point, if the spectrins work independently, it is reasonable to assume that they could display additive effects. Is this the case? If alpha- and beta-heavy-spectrin are simultaneously depleted are the phenotypes more severe than either depletion alone? 

      Response: We disagree here. Since both alpha- and beta-heavy-spectrin act through tension and Jub, and there is likely a limit as to how much Yki activity can be increased by this pathway. For example, the increases in wing size induced by spectrin RNAi are similar to the increases in wing size observed with constitutive recruitment of Jub through alpha-catenin mutation (Alegot et al 2019), which may thus represent the maximum increase that can be induced through this pathway (as there are multiple, independent factors that regulate Hippo signaling).

      Authors should modulate membrane tension and assess if this affects the localization of alpha- and beta-heavy-spectrin and, specifically, their colocalization, as their interaction could be regulated. 

      Response: As reported, we do see effects of tension on βH-Spec localization. We would not expect significant effects of membrane tension on α-Spec localization, but we consider analysis of this outside the scope of this manuscript.

      In lines 185-187, the authors mention that beta-spectrin depletion does not affect beta-heavy-spectrin localization. Interestingly, Figure 4E appears to show that the levels of Kst-YFP appear to be lower in the beta-spectrin-depleted tissue. The localization of beta-heavy-spectrin is not necessarily affected but the overall levels could be. 

      Response: Indeed the levels appear slightly lower, but elucidating the reason for this will require further experiments that are beyond the scope of this manuscript (we suspect it is because cytoskeletal tension increases in β-Spec-depleted tissue as it does in α-Spec depleted tissue, which based on our observations should decrease levels of Kst at near junctions). The key point of these experiments was to show that α-Spec localization does not require βH-Spec, but does require β-Spec, which supports our conclusion that in wing discs α-Spec forms a complex with β-Spec but not with βH-Spec.

      In lines 200-203, the authors state that beta-heavy-spectrin and myosin colocalize extensively at the apical region. However, this colocalization is not as clear as stated. Do the authors have alternative data that suggests that the two proteins are indeed colocalizing? Would it be possible to perform PLA to detect a potential colocalization? 

      Response: Unfortunately we do not have antibodies against both proteins that work well enough for PLA. However, we quantified the co-localization by analysis of Pearson's correlation coefficient, as reported in the manuscript. We also added an additional higher magnification image, and a line scan, in a supplemental figure (Fig. 6 supplement 1).

      Authors should try to assess and quantify colocalization with F-actin for both beta-heavy-spectrin and myosin in wild-type conditions and when the levels (and/or activity) for each of them are modulated. 

      Response: We have added quantification of the co-localization of βH-Spec with F-actin and of myosin with F-actin to the revised manuscript.

      Minor points: 

      In lines 122-124, the authors should clarify the relevance of the observation that alpha-spectrin knockdown affects the thickness of the wing disc epithelium. 

      Response: We have added some text to try to elaborate on this.

      In the intro, it is perhaps necessary to mention that there are conflicting reports regarding the role of spectrins in the regulation of cell proliferation, at least in the follicular epithelium. For instance, Ng et al., 2016 argued that spectrins do not regulate cell proliferation in FECs. 

      Response: Rather than wading into a detailed discussion of issues that are peripheral to this study, we modified the text in the Introduction to avoid implying that spectrins control cell proliferation in the ovary.

      In Figures 1, 2, 3, and 4 (and respective supplements), it is encouraged that, wherever appropriate, the authors mark the different compartments or the relevant boundary using dashed lines, to more clearly indicate the regions to compare. 

      Response: We have now done this.

      In Figure 2, supplement 1 panels C and D should have an indication of the genotype for clarity. 

      Response: We have now added this.

      In lines 362-367, the authors suggest that other actin-binding proteins are likely to influence the role of beta-heavy-spectrin. Have the authors tested the role of spectrin interactors such as Ankyrin and Adducin?

      Response: No, we have not examined this.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-01939

      Corresponding authors: Jiro Toshima, Junko Y. Toshima

      1. __ General Statements __ We are grateful for the reviewer’s evaluation of our study. In the new manuscript, we have answered all of the points raised by the two reviewers (the altered or added text is indicated in red in the new manuscript). Reviewer #1 pointed out that definition of "Vps21 activity" is unclear throughout the manuscript. In this study we have developed a novel biochemical method capable of detecting Vps21p activity with high sensitivity (Fig. 2) and utilized this method to measure Vps21p activity, which is clearly stated in the new manuscript. The reviewer #1 also pointed out the issue that we have not clearly explained about difference of two Vps21p-residing structures, small endosome-like puncta and aberrant large structure. To clearly distinguish them, in the new manuscript we have added data showing the size distribution of Vps21p-residing structures (Fig. S2). Regarding comment #2, we think that the reviewer may have misunderstood the data (please see the response to this comment described below). Reviewer #2 did not request any additional experiments but gave us many helpful comments to improve the manuscript. In the new manuscript, we have revised all the places that the reviewer pointed out.

      __ Point-by-point description of the revisions__

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      (Reviewers’ comments are in italics)

      *Summary: *

      In the present study Nagano et al. identify an overlapping function of clathrin adaptors in the activation of the yeast Vps21 Rab GTPase. This activation is regulated in a concerted manner by two TGN cargo adaptors, AP-1 and GGA1/2. The basis of this study is derived from the previous work Nagano et al., 2019 where authors reported that Ent3p and Ent5p are important for the formation of the Vps21p-positive endosome. By utilizing a synthetic genetic approach, the authors observed that disruption/loss of the AP-1 complex (apl4 mutant), Ent3p, Ent5p or Pik1 decreased fluorescence intensity for GFP-Vps21p and increased number of Vps21p puncta. They found that these effects for AP-1 disruption are additive, that is, each makes a distinct contribution, at least in ent3∆/ent5∆ mutant cells. They next examined the role of factors required for TGN localization of Ent3p/5p and AP-1 in Vps21p activation. The authors reported that GGA1/2, Pik1p and the Ypt31/32 Rab GTPases make modest contributions to targeting of AP-1 and Ent3/5 to the TGN. The observation that accumulation of GFP-Vps21 next to vacuolar compartments in pik1-1 ent3D mutants similar to that of ent3Dent5Dapl4D, lead authors to conclude that both PI(4)P as well as PI(4)P independent Ent3p recruitment to TGN plays a crucial role in Vps21p activation. Further they found that compared to the pik1-1 ypt31ts mutant (41%), activity of Vps21p (14%) was severely reduced in the pik1-1 ypt31ts gga1D gga2D mutant pointing towards redundancy among these factors in Vps21p activation. Finally using a class E Vps mutant authors found a fall in endosomal population of GFP-Vps9p ~29% in the ent3D ent5D mutant, which was further reduced to 0% in the ent3D ent5D apl4D* mutant. Collectively this study suggests a differential role of TGN adaptors, AP-1 and GGA in early endosome formation. Ent3p/5p and AP-1 are proposed to activate Vps21p by localizing Vps9p on endosomes and thus facilitating its transport whereas GGAs act redundantly along with Pik1p and Ypt31/32 in regulating TGN localization of Ent3p/5p and AP-1. *

      Major comments:

      There is a considerable amount of data that address the roles of AP-1, Ent3, Ent5, Gga1/2, and Pik1 in targeting of Vps21 and related trafficking pathway components to the TGN/endosome. The experiments are essentially genetic epistasis tests that compare the fluorescence patterns of GFP-Vps21 in a sophisticated set of strains. The genetic data are interpreted in terms of spatiotemporal dynamics of Vps21: proportion Vps21GTP on a compartment and number of GFP-Vps21 positive compartments. *Being genetic in nature, the data are open to wide interpretations in terms of molecular mechanisms that target candidate proteins Vps21p and Vps9 to the TGN/endosome. The authors presentation (Fig. 7) is based on well controlled experiments and is logical, but key questions regarding Vps9 trafficking as it relates to Vps21 endosome formation are not resolved. *

      Response:

      In this study, in addition to comparison of the fluorescence patterns of GFP-tagged yeast Rab5 (Vps21p), we have developed a novel biochemical method capable of detecting the amount of active Vps21p with high sensitivity. The amount of active Vps21p obtained by this method correlated well with the results obtained by imaging analysis, and we think this approach significantly increased the reliability of our results.

      Using this new biochemical method and fluorescence imaging analysis, we have clarified the overall regulatory mechanisms of Vps21p by vesicle transport from the TGN. In particular, we believe that this is an important study that links the activation of Vps21p that mediates endosome formation with numerous previous studies involving vesicle transport from the TGN to the endosome.

      Comment #1(a)

        • Throughout their study the authors conflate measurements of GFP-Vps21 puncta intensity and number of Vps21p puncta as readouts of Vps21 "activity". Figure 7 exemplifies this especially: "Vps21p Activity: 100%; Vps21p Activity: 45%; Vps21p Activity: 10%". *
      1. *a) Would the authors please explicitly define how they use "activity" in the manuscript? * Response:

      We appreciate the reviewer’s pointing out our error. As the reviewer pointed out, since we have used the word “activity” when we explained the result obtained by the fluorescence intensity and the number of Vps21p puncta in lines 312-315 (in the new manuscript), we have revised this sentence “~ a decreased PI(4)P level reduces Vps21p activity and thus inhibits fusion of Vps21p compartments.” to “~a decreased PI(4)P level seems to inhibit fusion of Vps21p compartments.” (lines 314-315).

      In other parts of the manuscript, we have used the word “activity” only when we explained the result obtained by measuring the amount of active Vps21p by the biochemical method (Fig. 2). “Vps21p Activity” depicted in Fig. 7A-C are also based on the results obtained by the biochemical assay, and thus we have added explanatory sentences in the Discussion section (lines 432-433, 447) and figure legend (lines 996-998) in the new manuscript.

      Comment #1(b)

      1. *b) The amounts of Vps21-GTP were measured for the ent3D ent5 and ent3D ent5 apl4D mutants (Fig. 2). Other mutant backgrounds should be analyzed in order to address the specific requirements of gga1/2, pik1 and ypt31/32 genes and to challenge the assumption that aspects of GFP-Vps21 localization correlate with the proportion of Vps21GTP. * Response:

      We agree with the reviewer’s comment that it is crucial to confirm that aspects of GFP-Vps21 localization correlate with the proportion of Vps21GTP. In the previous manuscript, we have already measured the amount of active Vps21p (GTP-bound form of Vps21p) in the pik1-1, and pik1-1 ent3D mutants (Fig. 4E) and shown that it decreases to ~62% in the pik1-1 mutant, or to ~22% in the pik1-1 ent3D mutant relative to wild-type cells (Fig. 4E). The relative amount of GTP-bound form of Vps21p in these mutants correlated well with the results obtained by imaging analyses of GFP-Vps21p (Fig. 4B and C). To make it clearer, we have added sentences “and the amounts of active Vps21p in these mutants correlate well with the results obtained by imaging analyses of GFP-Vps21p (Fig. 4B, C, and H).” in lines 326-327. We have also demonstrated that the amount of active Vps21p correlated with the fluorescence intensity of GFP-Vps21p at puncta in the pik1-1 ypt31ts or the pik1-1 ypt31ts gga1D2D mutant (Figs 4F-J, S4E), and explained about this in lines 334-341.

      Comment #1(c)

      1. *c) Regarding the measurements of fluorescence intensity of GFP-Vps21 puncta, how were distinct puncta identified, particularly in the large clusters of puncta shown in Figs. 1D, 3A, 4F, 5A, 5C. * Response:

      As the reviewer pointed out, in the previous manuscript we have not clearly explained about how we had distinguished two Vps21p-residing structures, small endosome-like puncta and aberrant large structure. To clearly distinguish them, in the new manuscript we examined the size and number of these structures and showed the data in Fig. S2. This result revealed that the ent3D5D apl4D mutant contains single large Vps21p-residing structure with a size of >100 pixels and many small Vps21p-residing puncta with a size of ~50 pixels. To explain about this, we have added sentences in lines 235-239. Regarding Fig. 5A and 5C, since these figures do not show the localization of Vps21p, we have not added explanation about them.

      Comment #2

      • In the representative micrographs shown in Fig. 1A (Vph1-mCH), 1B (Hse1-tdTom), 1D (Sec7-mCH) and 5A, why do only (roughly) half of the cells in each micrograph express the tagged organelle marker protein? Shouldn't all of the cells? What is especially concerning is that the appearance of GFP-Vps9 in cells that express Sec7-mCH is different than in cells that do not. Specifically, there are fewer GFP-Vps9 puncta in expressing cells and GFP-Vps9 appears to be largely cytosolic in these cells. Have the authors noted the same? *

      Response:

      In Fig. 1, we expressed mCherry/tdTomato-tagged protein only in wild-type cells (Fig. 1A and B) or in ent3D5D mutants (Fig. 1D) to distinguish the mutant cells from the wild-type cells, as described in the Result section (lines 156-159) and figure legends. As explained in the text (lines 156-159), by labeling only wild-type or mutant cells, we precisely evaluated the differences in the localization of GFP-Vps21p by comparing mutant cells directly alongside wild-type cells.

      In Fig. 5A, we expressed Sec7-mCH only in the ent3D5D mutants to distinguish the mutants from wild-type cells (the upper panels) or the ent3D5D apl4D mutants (the lower panels), as described in figure legend. Therefore, the reviewer’s comment that “the appearance of GFP-Vps9 in cells that express Sec7-mCH is different than in cells that do not. Specifically, there are fewer GFP-Vps9 puncta in expressing cells and GFP-Vps9 appears to be largely cytosolic in these cells.” is exactly what we wanted to show in this figure. To show this more clearly, we labeled cells with “WT” or “mutant” in these micrographs (Fig. 1A, 1B, 1D, and 5A).

      Comment #3

      • Figure 4A: How were the proportional contributions of each factor to the TGN localization of Ent3/5, AP-1 determined? What do the percentiles indicate? *

      Response:

      As described in the Result section (lines 293-297), we have shown that deletion of the GGA1 and GGA2 genes significantly decreased the localization of Ent3-GFP at the TGN to ~33% of wild-type cell, without changing the localization of Ent5-GFP and Apl2-GFP (Fig. S3A, B). Based on these results, the contribution of Gga1/2p to the localization of Ent3p, Ent5p, or AP-1 was evaluated to be 37%, 0%, or 0%, respectively (Fig. 4A). To make this clearer, we have added sentence “~ and thus, we evaluated the contribution of Gga1p/2p to the localization of Ent3p, Ent5p, or AP-1 to be 37%, 0%, or 0%, respectively (Fig. 4A)” in line 296-297. Similarly, we have determined the contribution of PI(4)P by assessing the localization of Ent3p, Ent5p and Apl2p at the TGN in the pik1-1 (Fig. S3C and D), as described in lines 297-305. Regarding Rab11s (Ypt31p/32p), we have evaluated the contribution based on the data in our previous study, as described in line 305-309.

      Comment #4

      • In the model presented in Figure 7, the authors proposed that AP-1 is required to target Vps9 from the late TGN to the early TGN. The best characterized function of AP-1 is to concentrate integral membrane proteins to form the inner layer of a clathrin coated vesicle. Vps9 is a soluble protein that fractionates with cytosolic proteins (Burd et al., 1996). Despite measuring intensity and localizing Vps9p with different endosomal markers (Fig. 6), the basis of membrane recruitment of Vps9 by TGN clathrin adaptors is unclear. How do the authors envision AP-1 to function in targeting of Vps9, a soluble protein, between compartments? *

      Response:

      Like other many Rab-GEFs (e.g., Sec2p, the GEF for Sec4p or Mon1p/Ccz1p, the GEF for Rab7), we think that Vps9p transiently localizes to the donor organelle to activate Rab proteins and load them on the transport vesicle. We have previously demonstrated that Arf1p, a Golgi-resident GTPase, plays an important role in the recruitment of Vps9p to the Golgi (Nagano et al., Comm. Biol., 2019). In this study we have shown that deletion of AP-1 in the ent3D5D mutant increases the localization of Vps9p at the TGN (Fig. 5A and B). These suggest that AP-1, like Ent3p/5p (Nagano et al., Comm Bio, 2019), is dispensable for the recruitment of Vps9p to the TGN but required for the transport of Vps9p from TGN to endosomes.

      In a recent study Casler et al. proposed a role of AP-1 function that maintain Golgi-resident proteins by mediating intra-Golgi recycling pathway (Casler et al., JCB, 2021). Based on this model, we have speculated that AP-1 also functions to maintain Vps9p in the TGN by recycling from the late TGN to early TGN and discussed about this in the second paragraph of the Discussion section (lines 434-454 in the new manuscript). However, as the reviewer #2 pointed out (please see comment #6 of the reviewer #2), Casler et al proposed AP-1’s role in transport from the TGN back to earlier Golgi compartment but did not discuss compartmentalization within the TGN, we have modified sentence in the Discussion from “~ the role of AP-1 that recycles Vps9p back to the early TGN might become apparent” to “~ the role of AP-1 that recycles Vps9p back to the earlier Golgi compartment might become apparent” (lines 444-445).

      __Minor Comment: __

      • The interchangeable terminology used to refer to Rab GTPases throughout the manuscript made it exceptionally difficult for me to focus on the presentation of the experiments. Vps21 and Rab5 are used interchangeably, but this study investigated Vps21, not Rab5. Vps21 does not even appear in the title or abstract. Similarly, Ypt31/32 is used interchangeably with Rab11, but this study investigated Ypt31/32, not Rab11. The accurate names of the yeast proteins should be used. A discussion regarding significance of the yeast proteins for understanding mammalian Rab5 and Rab11 belongs in the Discussion. *

      Response:

      In accordance with the reviewer’s suggestion, we have replaced Rab5 with yeast Rab5 or Ypt21p. We have also replaced Rab11 with yeast Rab11 or Ypt31p/32p.

      __Reviewer #1 (Significance (Required)): __

      *General assessment: In general, this is a well-executed and controlled study. The major strengths are the large quantity of data from complementary experiments that provide a rationale for the proposed mechanistic model proposed (Fig. 7). The major weaknesses lie with the genetic approach, which does not lend itself to the mechanistic interpretations that the authors propose, and the narrow scope of the work such that the study will be of interest to a small group of colleagues. The audience will likely include researchers who use yeast to investigate proteins sorting in the endo-lysosome network of organelles and colleagues who investigate signaling by Rab GTPases. *

      Response:

      We cannot agree with the reviewer’s comment that “the narrow scope of the work such that the study will be of interest to a small group of colleagues”, because the regulation of endosome formation by Rab5 is one of the major topics in the field of membrane traffic, and many mechanisms still remain to be elucidated. Moreover, the model we have proposed in this study is adaptable not only to yeast but to higher organisms, as discussed in the last paragraph of the Discussion section. The endolysosomal pathway is important for the regulation of a wide variety of crucial cellular processes, including mitosis, antigen presentation, cell migration, cholesterol uptake, and many intracellular signaling cascades. Our work thus also has implications for development, immunity, and oncogenesis. We believe that the studies described in our paper represent an advance in our understanding of the cellular biology of endocytic trafficking and therefore would be interesting to researchers in other fields, as well as membrane traffic filed.

      __ __

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      (Reviewers’ comments are in italics)

      *Summary: *

      *The manuscript by Nagano et al. describes the results of extensive analysis on the roles of clathrin adaptors for activation of Rab5 during TGN-to-endosome traffic in budding yeast. They examined the localization and activation status of Vps21, a major Rab5 member in yeast, in a variety of mutants and showed that AP-1 had a cooperative role with Epsin-related Ent3/5 in transport of Vps9 (Rab5 GEF) to endosomes. GGAs, PI4 kinase Pik1, and Ypt31/12 (Rab11) had partially overlapping functions in recruitment of AP-1 and Ent3/5 to TGN. *

      *It is an indeed extensive study but the interpretation of the results is complicated and somewhat speculative. It is most probably because the differences between mutants are partial (even though the authors tried to show statistics) and the logics to lead conclusions are not always compelling. To be honest, I had a hard time to follow rationales to justify arguments. The conclusions the authors make, that is, multiple clathrin adaptors cooperate in the TGN-to-endosome traffic, are reasonable, but I have several questions as follows, which I would like the authors to address. *

      Comment #1

        • The description about Vps21 fluorescence is often quite confusing. When the authors say fluorescence intensity, is it the total intensity of a whole cell or the average fluorescence intensity of individual puncta? For example, in Fig. 1D, it doesn't look to me at all that the GFP intensity of ent3/ent5 is lower than WT. How did the authors obtain the data of Fig. 1E? If the authors measured the fluorescence of individual puncta, how did they do it? * Response:

      We agree that in the previous manuscript explanation about how we measured Vps21p fluorescence intensity was insufficient. In this study, we have measured the whole fluorescence intensity of single GFP-Vps21p punctate structure, which was subtracted the cytoplasmic fluorescence background, and shown it as the fluorescence intensity of Vps21p compartment (the aberrant large GFP-Vps21p structure (Fig. 3A) were excluded). The graphs of fluorescence intensity of GFP-Vps21p show the average of three data (each average of 50 puncta) from three independent experiments. To clarify where and how Vps21 fluorescence was measured, in the new manuscript we have revised text (lines 160-161, 163, 166, 177, 179) and added explanatory sentences in “Materials and Methods” (lines 542-546).

      Regarding Fig. 1D and E, since the fluorescence intensity of GFP-Vps21p at the cytosol was increased in the ent3D5D mutant (Fig. 1D), the fluorescence intensity in the mutant may not have appeared lower than that in wild-type cell. To show the decrease of the fluorescence intensities of individual Vps21p puncta in the mutant cells more clearly, we have added the higher magnification view of GFP-Vps21p puncta in Fig. 1D in the new manuscript.

      Comment #2

      • Related to the previous question, how the images were taken is very important. In the legend to Fig.1, there is no description about the image analysis. Are they epifluorescence images or confocal images, and if the latter, are they ones of 2D confocal images or maximum intensity projections of Z stacks as mentioned in the legend to Fig. 3A? It matters very much. *

      Response:

      We appreciate the reviewer’s helpful suggestion. In Fig. 1, we have used epifluorescence images for analyzing the fluorescence intensity or number of GFP-Vps21p puncta, because Vps21p puncta have high mobility (please see also the responses to comment #9). In accordance with the reviewer’s suggestion, we have added the description about imaging method in the legend of Fig. 1 (lines 831-832, 837 and 843).

      Comment #3

      • It is also confusing when the authors say increase or decrease of fluorescence. Is it the intensity or the number of puncta? Please clarify which the authors intend to mention whenever relevant. There are many places that bother readers. *

      Response:

      We appreciate the reviewer’s helpful suggestion. In accordance with the reviewer’s suggestion, we have revised manuscript (lines 274 and 316).

      Comment #4

      • The method the authors developed to estimate the activation states of Vps21 is intriguing. It may provide important information without direct measurements of the GTP-binding activity. However, the results should be carefully interpreted because this kind of tricky experiments may not reflect the exact biochemical statuses in the cell. For example, I am concerned about whether release of GTP or spontaneous GTPase activity during the preparation processes is ignored. *

      Response:

      As the reviewer pointed out, we cannot rule out the possibility that the GTP-bound status might be changed during the preparation processes. However, this problem also occurs in the conventional pull-down assay, which assesses the amount of the GTP-bound form of Rab proteins. To confirm whether the activity of Vps21p assessed by this method reflects in vivo activation level, we have demonstrated that the level of active Vps21p correlated with the in vivo phenotypes, such as fluorescence intensity of GFP-Vps21p at the endosome and number of GFP-Vps21p puncta, that implicate defect of endosomal fusion. Thus, in the new manuscript we have added some sentences to explain about this (lines 221-222).

      Comment #5

      • In Discussion (p. 20, line 410), the authors describe that "Gga2p is localized predominantly at the Tlg2-residing compartment," but this is wrong. In the BioRxiv paper (2022), the authors showed that "Gga2p appears around the Sec7p-subcompartment and disappears at a similar time as Sec7p." I understand that, to explain the roles of GGAs in endosomal transport, it is reasonable to assume their presence in the Tlg2 compartment (and I agree on that), but the above description is wrong and must be corrected. *

      Response:

      We appreciate the reviewer’s helpful suggestion. As the reviewer described, we have recently demonstrated that Gga2p localization well overlapped with the Tlg2p-residing TGN sub-compartment that is structurally distinct from the Sec7p-residing sub-compartment (Toshima et al., BioRxiv, 2022). Thus, in accordance with reviewer's suggestion, we have changed this sentence to “Interestingly, Gga2p appears to reside at the Tlg2p sub-compartment, which is distinct from the Sec7p sub-compartment.” in the new manuscript (lines 427-428).

      Comment #6

      • Hypothesizing the role of AP-1 in the recycling from the late TGN to the early TGN is new. Glick's group proposed its role in transport from the TGN back to earlier compartment (Golgi) but did not discuss compartmentalization within the TGN. The authors' speculation is a fancy idea, but I am afraid there is no direct evidence for that. *

      Response:

      We appreciate the reviewer’s appropriate and helpful suggestion. As the reviewer pointed out, Glick's group has proposed its role in transport from the TGN back to earlier Golgi compartment, but not discussed compartmentalization within the TGN (Casler et al., 2021, JCB), and thus we modified sentence in the Discussion section from “~ the role of AP-1 that recycles Vps9p back to the early TGN might become apparent.” to “~ the role of AP-1 that recycles Vps9p back to the earlier Golgi compartment might become apparent.” (lines 444-445).

      Comment #7

      • The role of Ypt31/32 (Rab11) is also puzzling to me. It could be an indirect effect, which might be due to the complex network of GTPases as proposed by Chris Fromme (2014). Am I correct? *

      Response:

      As the reviewer pointed out, Fromme’s group has shown that Ypt31/32 forms the complex networks with several GTPases and their GEFs (McDonold and Fromme, 2014, Dev Cell; Thomas and Fromme, 2016, JCB, Thomas et al., 2019, Dev Cell), in which Ypt31/32 promotes the activation of Arf1p via its GEF Sec7p. We have previously shown that Arf1p plays an important role in the recruitment of Vps9p to the Golgi (Nagano et al., Comm. Biol., 2019). These findings suggest that disruption of Ypt31p/32p may affect the localization of Vps9p through reduced activity of Arf1p. However, arf1D and ypt31ts mutants exhibit different effects on the Vps9p localization: in arf1D mutant the recruitment of Vps9p to the TGN is impaired and in ypt31ts mutant Vps9p localization at the TGN is increased (Nagano et al., 2019, Comm Biol.). Thus, the role of Ypt31/32 in the Vps9p localization appears to be independent of Arf1p activity. In the new manuscript, we have added a brief discussion about this (lines 466-473).

      Comment #8

      • In the legend to Fig. 3D, the authors state that the read arrowheads indicate 50 nm vesicles and black arrowheads indicate vesicle clusters. However, the electron micrograph clearly shows that their morphologies are different. Red ones, which I estimate to be a little larger than 50 nm, often appear to have dense material inside, while those in black are even larger (probably around 200 nm) and do not look like a cluster of the same type of vesicles (I do not even think that such large structures should be called vesicles). How do the authors explain these differences? *

      Response:

      In the previous manuscript explanation about the electron microscopy analysis was insufficient. In the new manuscript, to clearly distinguish two Vps21p-residing structures, small endosome-like puncta and aberrant large structure, observed in ent3D5D apl4D mutant by fluorescence microscopy (Fig. 3A), we examined the size and number of these structures and showed the data in Fig. S2. This result revealed that the ent3D5D apl4D mutant contains single aberrant large aggregate with a size of >100 pixel adjacent to the vacuole and endosome-like structures with a size of Comment #9

      • In Fig. 4F, the authors show different sets of images, Focal plane and Z projection. What is the purpose to do it? The results with Z projection should be more informative. Why the authors use only Focal plane data for the analysis in panel G? *

      Response:

      We measured the fluorescence intensity or number of individual GFP-Vps21p puncta using a single focal plane images (Figs. 1C, 1E, 3I, and 4B), because Vps21p-residing small puncta have high mobility and identical endosome often appears in multiple different planes in the Z-stack image taken by a conventional epifluorescence microscope. In contrast, we analyzed the aberrant large aggregate using Z projection image (Figs. 3B, S3G) because this structure is relatively stable and low motile, and not observed if it is not in the focal plane. In Fig. 4F, since both of small puncta and large aggregate are analyzed, we have shown both of focal plane image and Z-projection image. In new manuscript, we have added about the description about imaging method in each figure legend or text (lines 230-232, 332-334).

      __Reviewer #2 (Significance (Required)): __

      *It is a complicated story but I find most of the conclusions reasonable. It provides important knowledge to the understanding on the Rab5 GTPase regulation in trafficking from the TGN. *

      Response:

      We are very grateful for this reviewer’s favorable evaluation of our studies.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      1. General Statements [optional]

      We would like to thank all reviewers for their constructive feedback and for raising specific points that have helped to improve our manuscript. We accept that the initial submission did not include some quantitative aspects of the observed effects. These are now included together with all the suggested experiments from the reviewers with the use of additional mutants and appropriate protein markers. We believe that the manuscript offers a conceptual advance and a molecular mechanism for the effects of caffeine on cell cycle progression of eukaryotic cells and is of interest to geneticists working on cell cycle, cancer and biogerontology.

      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:

      In the manuscript “The AMPK-TORC1 signaling axis regulates caffeine-mediated DNA damage checkpoint override and cell cycle effects in fission yeast,” the authors studied the role of genes that are potentially involved in the caffeine-mediated override of a cell cycle arrest caused by activation of the DNA damage checkpoint. The methylxanthine substance caffeine has been known to override the DNA damage checkpoint arrest and enhance sensitivity to DNA damaging agents. While caffeine was reported to target the ATM ortholog Rad3, the authors previously reported that caffeine targets TORC1 (Rallis et al, Aging Cell, 2013). Inhibition of TORC1, like caffeine, was also reported to override DNA damage checkpoint signaling. Therefore, in the present study, the authors compared the effects of caffeine and torin1 (a potent inhibitor for TORC1 and TORC2) on cell cycle arrest caused by phleomycin, a DNA damaging agent, using various gene deletion S. pombe mutants.

      The authors concluded that they identified a novel role of Ssp1 (calcium/calmodulin-dependent protein kinase) and Ssp2 (catalytic subunit of AMP-activated kinase) in the cell cycle effects caused by caffeine, based on the following findings; (1) the caffeine-mediated DNA damage checkpoint override requires Ssp1 and Ssp2; (2) Ssp1 and Ssp2 are required for caffeine-induced hypersensitivity against phleomycin; (3) under normal growth conditions, caffeine leads to a sustained increase of the septation index in a Ssp2-dependent manner; (4) Caffeine activates Ssp2 and partially inhibits TORC1.

      Major comments:

      I do not think that many of the authors’ claims are supported by the results of the present study. The corresponding parts are detailed below.

      1. The conclusion of the first paragraph in the Results (top in page 6; Our findings indicate that caffeine and torin1 indirectly and directly inhibit TORC1 activity respectively.) is not supported by the data in Figure 1. The result that caffeine, but not torin1, requires Ssp1 and Ssp2 to override the phleomycin-induced cell cycle arrest does not necessarily indicate that caffeine indirectly inhibits TORC1 via Ssp1 and Ssp2. Rather, the authors should mention that this conclusion is based on the authors’ previous reports by citing them (e.g., Rallis et al, Sci Rep, 2017). To add to Figure 1, an additional experiment using a constitutively active AMPK mutant, a temperature-sensitive TORC1 mutant, and a srk1 deletion mutant will help the authors claim their original conclusion as one possibility.

      Torin1 inhibits TORC1 and 2 leading to G2 cell cycle arrest following accelerated mitosis. In contrast, caffeine has been reported to enhance the inhibitory effect of rapamycin on TORC1 signaling but does not inhibit growth. It has not been reported that TORC1 is a direct target of rapamycin. We previously demonstrated that caffeine induces Srk1 in a Sty1 dependent manner (Alao et al., 2014). Furthermore, Ssp1 plays a role in regulating Srk1/ Cdc25 activity. It is therefore possible, that Ssp1 influences the ability of caffeine to promote mitotic progression as part of the stress response while also affecting TORC1 activity via Ssp2. As ssp2∆ cells have higher intrinsic TORC1 activity, this could also attenuate the effect of caffeine on mitosis.

      We have modified the first paragraph of the results section to address the reviewer’s concerns.

      We have previously reported that Srk1 modulates the ability of caffeine to drive cells into mitosis (Alao et al., 2014).

      1. The conclusion of the second paragraph in the Results (lower-middle in page 6; Our results indicate that caffeine induces the activation of Ssp2.) is not based on the results of Figure 2. Figure 2 simply illustrates that both caffeine and torin1 cause hypersensitivity to phleomycin dependent on Ssp1 and Ssp2.

      We appreciate the reviewer’s contention and have modified the text.

      1. The conclusion of the fourth paragraph in the Results (middle in page 7) is not clearly supported by the result, due to an insufficient data analysis. As the cell length and the progress through mitosis are the key assay parameters in Figure 3, the average cell length should be shown next to each micrograph of Figure 3A and 3B. In Figure 3C, a mitotic index and the average cell length should be shown next to each micrograph. A statistical analysis is necessary for the authors to compare the measurements and to claim as the headline (Caffeine exacerbates the ssp1D phenotype under environmental stress conditions), as the effect of caffeine was not evident._

      We have conducted additional experiments to measure cell length and modified the figure to include this data. We believe our observation that caffeine alone induces increased cell length in ssp1 mutants, confirms a role for the Ssp1 protein in modulating the effects of caffeine. We previously showed that Caffeine activates Srk1 which in turn inhibits Cdc25 activity similar to other environmental stresses (Alao et al., 2014). Ssp1 negatively regulates Srk1 following exposure to stress. In contrast, caffeine advances mitosis in wt cells and thus does not result in increased cell length. We also demonstrate that caffeine greatly enhances cell length in ssp1 mutants exposed to heat stress in marked contrast to rapamycin and torin1. These findings indicate that Ssp1 mediates the effect of caffeine on mitosis.

      1. In the middle of page 8, the statement “Accordingly, the effect of caffeine and torin1 on DNA damage sensitivity was attenuated in gsk3D mutants (Figure 5C and 5D).” is not supported by the corresponding results. Rather, Figure 5C and 5D look almost the same.

      We agree with this and other reviewers that demonstrating enhanced sensitivity to caffeine is problematic. Nonetheless, our cell cycle data clearly indicate a differential role for Gsk3 in mediating the cell cycle effects of caffeine and torin1. In terms of DNA damage sensitivity, we have reproducibly observed a lower degree of DNA damage sensitivity in gsk3 mutants relative to wt cells. Hence, while caffeine is less effective at enhancing DNA damage sensitivity relative to torin1 in wt cells; we observed that caffeine and torin1 increase DNA damage sensitivity to a similar degree in gsk3 mutants.

      1. The description and the conclusion of the last paragraph in the Results (bottom in page 8 – page 9) are not supported by the results of Figure 6, due to an insufficient data analysis. The extent of phosphorylation must be quantified as a ratio of the phosphorylated species (e.g., pSsp2) to all species of the protein (e.g., Ssp2).

      We have carefully repeated our experiments under various conditions. Our results clearly indicate caffeine induced Ssp2 phosphorylation. These observations have not been reported previously.

      From Figure 6, the authors claim that caffeine (10 mM) partially inhibits TORC1 signaling. However, the authors previously showed that the same concentration of caffeine inhibited phosphorylation of ribosome S6 kinase as strongly as rapamycin, the potent TOR inhibitor (Rallis et al, Aging Cell, 2013). The authors are advised to assess phosphorylation of S6 kinase again in the present study and compare to the results of the present results in Figure 6, because addition of that data may allow the authors to discuss that caffeine affects TORC1 downstream pathways at different intensities.

      While rapamycin is a strong inhibitor of TORC1 in budding yeast, this is not the case in fission yeast. Our previous assessments of p-S6 levels and polysomal profiles as well as cell-cycle progression kinetics have shown this (Rallis et al, Aging Cell, 2013). In addition, gene expression analysis from our previous studies have shown that caffeine treatment results in a gene expression profile similar to that of cells in nitrogen starvation (TORC1 inhibition).

      We have now used an Sck1-HA strain to further enhance our study and address the reviewer’s concerns. Previous studies have shown that 100 ng/mL rapamycin does not affect Sck1 phosphorylation. We demonstrate that in contrast to rapamycin (100 ng/ mL) 10 mM caffeine affects Sck1-HA expression and or phosphorylation. This effect was also observed with 5 µM torin1 albeit to a greater degree.

      Also, immunoblotting of the same proteins looks somehow different from panel to panel (e.g., pSsp2 in panel A and D; Actin in panel A, C, and D). Therefore, the blotting result before clipping had better be shown as a supplementary material.

      We repeated the blots were necessary and used ponceau S as a loading control. The original blots can be made available to all.

      Minor comments:

      1. (Figure 1) The septation index of the phleomycin-treated cells (without any further additional drugs) should be shown, as a baseline.

      We have included data for untreated cultures and phleomycin-only treated cultures.

      1. (Figure 1D, Optional) As a ppk18D cek1D double deletion mutant is reported, the authors are advised to add and test that mutant in this experiment.

      We have added the related data for the _ppk18_Δ _cek1_Δ double mutant.

      1. (Figure 2) The authors need to clarify the number of cell bodies spotted (e.g., in the Figure legend).

      We have modified the figure legend accordingly.

      1. (Figure 3) The different number of cells in micrographs may give an (wrong) impression on the cell proliferation rate. Therefore, it is advisable to use the micrographs in which the similar number of cells are shown for conditions with the similar cell proliferation rates.

      We have included data to show the cell lengths under different conditions. We find that different conditions greatly affect proliferation rates. For instance, cells do not proliferate in the presence of torin1. We initially sought to investigate if caffeine induces a phenotype in ssp1 mutants by virtue of its interaction with the DNA damage response. The micrographs were included as representative examples and have been now complemented with cell length data.

      1. (Figure 4B) ssp2D, not spp2D.

      The figure legend has been edited.

      1. (Figure 4) The septation index of the none-treated cells should be shown as a baseline.

      We have included base line data for untreated wt cells in figure 1. We have no reason to suspect any of the mutants would provide different results over the time investigated.

      1. (Figure 6B, 6E) What do the black arrows indicate? Figure Legend does not seem to explain them.

      The legend has been modified to indicate what the arrows refer to.

      1. (Figure 6C) Indicate which part of the Maf1-PK blot corresponds to the phosphorylated species, because Maf1-PK is probed with an anti-V5 (not a phosphorylation-specific) antibody.

      These experiments have been carefully repeated under different conditions and the figure is now modified accordingly.

      1. (Figure 6D) gsk3Dssp1D, not gs3Dssp1D.

      We have deleted this figure and have now replaced it with data we believe is more appropriate.

      Reviewer #1 (Significance):

      As caffeine is implicated in protective effects against diseases including cancer and improved responses to clinical therapies, the topic of the present study is of interest and importance to the broad audience.

      In the present study, the most significant finding is that caffeine- and torin1-induced hypersensitivity to phleomycin is dependent on Ssp1 and Ssp2 (Figure 2). This result may be important in chemotherapy against cancers. On the other hand, caffeine is known to activate AMPK (e.g., Jensen Am J Physiol Endocrinol, 2007). Besides, as detailed in the Major comments, many of the major conclusions are not supported by the present results. Therefore, based on my field of expertise (cell cycle, cell proliferation, and TOR signaling), I conclude that the present study hardly extends the knowledge in the field of "the cell biology of caffeine."_

      We thank the reviewer for their helpful comments. We accept the constructive criticisms and have carried out extensive additional experiments to provide further roles for Ssp2 and TORC1, in mediating the cell cycle effects of caffeine. We stress that caffeine has previously been proposed its effects via inhibition of Rad3 activity. Our previous work showed that caffeine did not inhibit Rad3 mediated checkpoint signaling. As later studies suggested caffeine inhibited TORC1 activity, the major goal was to investigate if caffeine is an indirect inhibitor of TORC1 via Ssp2 which is activated by several stresses. It has never been demonstrated that caffeine signals via Ssp2. This study provides the first evidence that caffeine modulates cell cycle progression by at least partially signaling via Ssp2 and TORC1. After nearly 30 years, it is vital that its precise activity, in particular enhancing DNA damage sensitivity is properly characterized. Such work woold open the way for additional studies on how caffeine activates cell physiology. For instance, we show that caffeine at 10 mM is more effective at inhibiting Sck1 activity than Rapamycin at 100 ng/ ml. In contrast, rapamycin at this concentration is more effective at inhibiting Maf1 activity. Hence further studies on how exactly the combination of caffeine and rapamycin influences their effect on ageing and other TORC1 regulated processes.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary: In this paper, Alao and Rallis analyze the role of AMPK and TORC1 pathways, and the respective crosstalk, in regulating cell cycle progression in the presence of DNA damage in S. pombe. The authors show, almost exclusively through chemo-genetic epistasis assays, that caffeine inhibits TORC1 indirectly activating AMPK, in contrast to the specific ATP-competitive TORC1 inhibitor torin1. Specifically, it is shown that in the absence of a functional AMPK pathway caffeine is unable to revert the TORC1-inhibition-dependent override of cell-cycle arrest caused by the DNA-damaging agent phleomycin, henceforth partially suppressing the growth inhibition caused by the co-treatment.

      Major comments: The overall story of the paper is convincing. However, the choice of an almost exclusively chemo-genetic approach, lack of controls in some experiments and some discrepancy in data presentation suggest that the manuscript undergoes revision before the authors claim that their conclusions are fully supported by the results. In detail:

      In Figure 1, graphs of septation indexes are presented separately for each strain. This presentation prevents the reader from clearly comparing the differences of septation caused by genetic background rather than the treatment, i.e. the septation happening by treatment with torin1. I feel it would be better to group the results by drug rather than by strain/mutant. If the results are presented this way because the experiments on different strains were run separately, I further suggest that they are re-run so to always include at least the wt in every run._

      We have included data for untreated and phleomycin only treated wt cells as a reference. Additionally, all experiments were repeated at least 2 times. We have used this assay for over 10 years and have found it to be reproducible and reliable. We are not able to include wt cells in every run as this would be beyond the manpower capacity and time constraints involved. It is also likely that torin1 activity is influenced by the ssp1/ 2 backgrounds due to increased basal TORC1 activity as previously reported. The main goal was to illustrate that caffeine differs from a direct inhibitor such as torin1.

      Furthermore, torin1 inhibits both TORC1 and TORC2 and thus cannot be directly compared to caffeine. We do prove however, in this and other figures that in contrast to torin1 and rapamycin that caffeine signals via targets upstream of TORC1. We can therefore deduce that it functions in a manner similar to other environmental and nutrient stresses, which require with the Ssp1 and Sty1 regulated pathways to advance mitosis and other processes such as autophagy induction.

      In Figure 2C-D, an inconsistency is observable between the phleo+caffeine sensitivity of ssp1Δ and ssp2Δ, the latter retaining a higher sensitivity. Provided that this is not only due to this specific replicate, how would the authors explain such a difference and fit it into their conclusion of a "cascade" signaling with Ssp1 acting upstream of Ssp2?

      We agree that analyzing the different interacting pathways involved, is complex. For instance, Ssp1 is required for suppressing Srk1 following Sty1 activation independently of its effects on Ssp2 and TORC1. Furthermore, basal TORC1 activity is higher in Ssp2 mutants as previously reported. It is likely that Ssp1 exerts a more definitive role as it is required to directly reactivate Cdc25 activity following exposure to stress. In contrast Ssp2 activation eventually results in increased Cdc25 activity via inhibition of PP2A (Figure 8). These experiments are, thus, intended to compliment those in figure1 but the DNA damaging effects of caffeine must also be taken into account.

      In Figure 2I, a huge discrepancy is observable compared to panel 2A in terms of phleo+caffeine (no ATP) sensitivity of wt cells. Here, cells seem to cope well with the phleomycin treatment even if co-treated with caffeine. This renders the main finding of the panel (the effect of phelo+caffeine+ATP) rather uninterpretable.

      We have noted that relevant assays, at least in fission yeast, are influenced by the culture vessels (e.g., plastic type/ glass) as well as the vessel volume (probably due to different aeration, oxygen availability that affects growth and metabolism parameters). We have corrected figure 1a. In terms of ATP, these experiments are highly reproducible even if the exact mechanism remains unclear.

      In Figure 3A, the simple observation of elongation is sometimes hard to assess, for example in the ATP-caused suppression of the effect of torin 1, as also acknowledge by the authors in the text. I feel it would be really necessary to quantify such results on an adequate number of cells.

      We have reproducibly observed this uncharacterized effect of ATP. We have analysed the cell length in additional experiments to show that ATP influences average cell length under these conditions. It is important to note that the effects of phleomycin are pleotropic. For instance, it likely induces cell cycle arrest at various cell cycle phases as well as in early and late G2. Additionally, it may influence other cellular processes such as DNA or compete with drug targets such as TORC1 which is influenced by ATP.

      In Figure 3B,C wt is missing to compare the results in the presence of the same treatments. I understand the focus on Ssp1, but the authors should show the same treatments on wt cells. Similarly, it would be better to show the drug treatments in panel C also at 30{degree sign}C. For the same reasons as in the previous point, quantifications would greatly enhance the credibility of the claims here.

      Previous work by other investigators have shown that wt cells proliferate normally under these conditions. We also show in figure 1 that cell proliferation is not affected under nor cycling conditions in these assays. We have added cell length data that convincingly prove that Ssp1 is required to mediate the mitotic effects of caffeine. It appears that caffeine induces a cell cycle delay that requires Ssp1 to suppress Srk1- mediated Cdc25 inhibition. Furthermore, recent studies have demonstrated that rapamycin (which targets TORC1 downstream of Ssp1) allows cell proliferation at higher temperatures in S. pombe.

      A major point is the almost complete absence of molecular data. Except for Figure 6, the data do not include a detection of the relative activation of the relevant pathways. Figure 6 could hardly fill this gap, since the samples therein analyzed are not the ones utilized in most of the other figures, but simple, single time-point treatment with a single drug. The authors usually refer in the text to previous knowledge about how a treatment influences a pathway. However, they should show it here in their experimental conditions.

      We have performed extensive additional experiments including those suggested by the reviewer. These experiments conclusively show caffeine induces Ssp2 phosphorylation in an Ssp1- dependent manner. We also demonstrate that caffeine attenuates TORC1 signaling. Together with the cell cycle data, our findings strongly suggest caffeine indirectly inhibits TORC1 signaling a manner analogous to other environmental stresses. We also note that the inhibitory effect of caffeine on TORC1 has been demonstrated in several studies. What have provided further evidence for this but have for the first time demonstrated, that caffeine affects Ssp2.

      Minor comments:<br /> • A different grouping of the experiments/panels would help the reader. For example, Fig. 2I would fit better together with Fig. 3A, to match the composition of the various chapters of the results.

      We have performed additional experiments as suggested by the other reviewers. We believe the data is now easier to understand.

      Torin 1 is sometimes referred to with a capital T or with a lowercase t, especially in the Figures. I suggest to uniform the nomenclature.

      We have edited the text.

      In the results, the authors state that "ATP may increase TORC1 activity or act as a competitive inhibitor towards both compounds.". It's a little bit odd to refer to ATP as a competitive inhibitor of drugs. I would rather be ATP, the physiological agonist, outcompeting two compounds which are working as ATP-competitive inhibitors.

      We have modified the text accordingly.

      Reviewer #2 (Significance):

      The interplay between TORC1 and AMPK is of great interest in the cell signaling field, basically in every model organism.

      The paper provides a conceptual advance in the field showing a genetic interaction between the two pathways using a model organism which has probably been overlooked so far, which is a pity because S. pombe is the best organism to study G2/M cell cycle/size regulation. The story would be of interest especially for an audience working in cell signaling in microorganisms, but not so much (at least at this stage) for the community working on aging, disease and chemo-/radio-sensitization, contrary to what the authors claim. Furthermore, for the above-mentioned reasons, I feel like the authors are a little bit overshooting when claiming (for example in the abstract and in the discussion), that their work provides a clear understanding of the mechanism.<br /> As requested by Review Commons, I specify that my expertise is on TORC1/AMPK/PKA pathways, on their crosstalk and their regulation by metabolic intermediates.

      We believe that the additional requested experiments have adequately improved the manuscript and support our presented mechanistic model.

      Caffeine is interest in cancer biology and the biogerontology field proven by recent reports on metabolic phenotyping, liver function testing, induction of autophagy and interplay with HIF-1, just to mention a few.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary<br /> This manuscript examines the genetic requirements for checkpoint override by caffeine in the fission yeast model organism. The main outcome is to show that checkpoint override, which has previously been linked to the downregulation of TORC1, is dependent on on the AMPK pathway (Ssp1/Ssp2). Additional analysis of downstream factors and the cross-talking Sty1 pathway implicates Greatwall kinases and Igo1 (PP2A inhibitor - endosulfine analogue) although the pleiotropic nature of these pathways and the rather blunt endpoints of septation index and phleomycin sensitivity makes robust data interpretation difficult.

      Major comments<br /> For clarity the manuscript would benefit from some restructuring. In particular it would help the reader if the diagram presented in figure 7 was presented first as this would help orientate the reader with the pathways. The mammalian equivalents should be indicated.

      Figure 8 (previously figure 7) summarizes our findings schematically. We believe that it works well at the end as a conclusion to the work and the discussion. Wherever appropriate we have mentioned the mammalian equivalent (e.g., for Rad3).

      For scientific accuracy and clarity the manuscript requires significant attention. For example in the abstract where Rad3 is introduced it is not made clear that this is the fission yeast gene. It would be better to introduce ATR at this point? Anther example in the abstract: 'Deletion of ssp1 and ssp2 suppresses...' should read 'Deletion of ssp1 or ssp2 suppresses...' as the two genes are not deleted in the same strain. I would recommend that the authors carefully revise the manuscript paying close attention to each statement. Fore example on page 4: 'Downstream of TORC1, caffeine failed to accelerate ppk18D but not igo1D and partially overrode DNA damage checkpoint signalling'. It is unclear what the authors mean by accelerate. I assume they mean accelerate cell cycle progression, but there is no direct analysis of cell cycle kinetics in the results. Similarly on page 5: '... ppk18D mutant displayed slower cell cycle kinetics than wild type cells exposed to phleomycin and caffeine or torin1 (Figuer 1D)'. However, the figure shows no cell cycle kinetic analysis.

      We have modified the wording of the abstract according to the reviewer’s suggestions.

      We refer to accelerated progression into mitosis and have edited the text where appropriate. Depending on the type of DNA damage, S. pombe cells transiently or permanently arrest cell cycle progression. It is well known that caffeine overrides these cell cycle DNA damage checkpoints. We previously proved that this was not due to Rad3 inhibition. Additionally, TORC1 (which controls the timing of mitosis) inhibition overrides checkpoint signaling. Our aim was to investigate if caffeine mimics this effect at least partially, via activation of Ssp2. We have demonstrated this is the case, although the basal state of the various mutants can complicate the data analysis in terms of cell cycle progression. Following exposure to phleomycin, this septation index peaks at 60 minutes following exposure to caffeine. In ppk18 mutants this peak was delayed by 30 minutes. Thus, wt and ppk18 mutants proceed through mitosis and cytokinesis at different rates (as determined by measuring the septation index).

      The authors appear to make the assumption that 'Inhibition of DNA damage signalling by caffeine and torin1 enhanced phleomycin sensitivity...' (page 6) but then clearly go on to show that the mutants used are sensitive for other unknown reasons. To make this link it would be necessary to artificially impose a G2 delay and show how much and in which circumstances this reverses the effect on sensitivity of caffeine/torin1. The authors should thus be very clear that they cannot equate sensitivity to 'checkpoint over-ride' and adjust their wording and assumptions accordingly. Assumptions on epistasis need to use the same assay and not equate between assays. As an example F1C and F2D do not equate as phleo+caffeine would be expected to be sensitised above phleo+torin1. This is not commented on in the text. Also on page 7 '... ATP also suppressed the ability of torin1 to override DNA damage checkpoint signalling albeit to a lesser degree (Figure 2I).' However, this figure only shows sensitivity, not septation index.

      We accept that these results can be difficult to interpret. Firstly, caffeine appears to modulate cell cycle progression by various means. We previously demonstrated that it stabilizes Cdc25 independently of checkpoint signaling. However, it also activates Ssp2 which subsequently affects Cdc25 activity via PP2A. Its effect on mitosis can thus differ depending on the context. For instance, igo1 mutants already have high PP2A activity which would affect the subsequent effect of caffeine on Cdc25 activity. Ssp2 on the other hand appears to regulate cell fate according to the nutritional state. Its sensing of nutritional cues is not limited to ATP/ AMP levels as it also regulates the response to amino acid quality (e.g., glutamate versus torin1).

      We have carried out additional experiments on the effect of ATP. While it did affect progression into mitosis, the results were complicated and have not been shown. Instead, we have provided additional data to show that it affects cell length which is an indicator of G2 cell length. In other words, longer cells spend more time in G2 prior to septation.

      We also suspect that caffeine is itself a DNA damaging agent as previously reported in the early 1970s. More recent studies have also indicated a role for Rad3 and DNA repair proteins for tolerance to caffeine. In fact, TORC1 itself has been reported to be required for DNA damage repair. Thus, TORC1 inhibition could potentially enhance DNA damage sensitivity independently of mitotic progression as shown in some of our experiments.

      While we have clearly identified a role for Ssp2 in mediating the cell cycle effects of caffeine, we accept that these findings will require further studies (beyond the scope of this one); to give more insights on how these caffeine- mediated effects occur. What is clear is that caffeine overrides DNA damage checkpoint signaling by at least partially inhibiting TORC1 signaling.

      All the septation index graphs require an untreated (I.e no caffeine or torin1) control.

      We now show in figure 1a, that the septation index does not change over the time period studied, when cells were left untreated. These assays have been routinely used for many years now and are very reproducible. The graphs clearly show the differential effects caffeine and torin1 exert on cell cycle progression in wt and mutant strains exposed to phleomycin.

      Figure 3 is not quantitative and cannot support the conclusions drawn from it. If, for example, the authors wish to demonstrate ATP can suppress checkpoint override (Figure 3A) they should use the same septation assay used before. If this is not possible, then it should be explained why not and an alternative quantitative assay should be developed. It is unclear why the authors include Figure 3B,C at all.

      Ssp2, on the other hand, appears to regulate cell fate according to the nutritional state. Its sensing of nutritional cues is not limited to ATP/AMP levels as it also regulates the response to amino acid quality (e.g., glutamate versus torin1). Additionally, exposure to stress may induce a transient decline in ATP levels. We thus investigated how ATP might affect caffeine or torin1. We could not detect any major changes in the septation index (not shown). Cells exposed to ATP in the presence of caffeine and phleomycin were shorter. We cannot tell how exactly suppresses the effect of caffeine and torin1 on DNA damage sensitivity.

      It is unclear to this reviewer what the significance of the data with gsk3D cells is (Figure 5). The authors should introduce the protein, why there is an expectation that it would have a role in the pathway and explain its relevance. Similarly when discussing the resulting data.

      Gsk3 lies downstream of TORC2 which is inhibited by torin1 but not caffeine. Gsk3 regulates Pub1 stability which is the E3 ligase for Cdc25. We showed previously that caffeine stabilizes Cdc25, suggesting it might interfere with Pub1 activity. Additionally, we are investigating caffeine as an indirect inhibitor of TORC1 with torin1 that directly inhibits both complexes. Our data provide further evidence for a differential effect of caffeine and torin1 on TORC1 signaling. We have modified the text accordingly.

      Figure 5A shows a similar response of wild type cells to phleomycin regarding checkpoint override as was shown in Figure 1A. However Figure 5C is not recognisable as equivalent to Figure 2A, yet both report sensitivity to phleomycin od wild type cells under equivalent circumstances. This is a major concern as to reproducibility of these data. It is also not possible to conclude from either Figure 5C or 5D that caffeine or torin1 treatment is, or is not, sensitising cells to phleomycin treatment, yet this conclusion is made when discussing the data.

      We agree with this and other reviewers that demonstrating enhanced sensitivity to caffeine is problematic. Nonetheless, our cell cycle data clearly indicate a differential role for Gsk3 in mediating the cell cycle effects of caffeine and torin1. In terms of DNA damage sensitivity, we have reproducibly observed a lower degree of DNA damage sensitivity in gsk3 mutants relative to wt cells. Hence, while caffeine is less effective at enhancing DNA damage sensitivity relative to torin1 in wt cells; we observed that caffeine and torin1 increase DNA damage sensitivity to a similar degree in gsk3 mutants.

      Figure 6A shows that caffeine, but not torin1 results in Ssp2 phosphorylation. Is this experiment reproducible and does the total level of Ssp2 increase reproducibly? This should be doe ae and the results discussed. Ideally, the bands would be quantified against actin intensity and presented as a bar graph with standard deviation.

      We have repeated these experiments alone and in combination with phleomycin. This data convincingly show that caffeine but not torin1 induces Ssp2 phosphorylation. In fact, torin1 suppresses Ssp2 phosphorylation, likely due to inhibition of a feedback mechanism resulting from TORC1 inhibition. In contrast, caffeine likely activates Ssp1 via the stress response, which in turn phosphorylates Ssp2.

      Figure 6B, when introduced should explain the background as to why eIF2alpha phosphorylation is a readout of TORC1 activity. Importantly, the figure should be supported by an actin control and 3 repeats quantified. Figure 6C purports to establish that caffeine moderately attenuates Maf1 phosphorylation. To be able to state this, it would be essential to quantify the gel and report repeated results relative to actin and the total levels of Maf1. Similarly Figure6D and 6E require an actin control and would benefit from proper quantification.

      We have repeated the Maf1 experiments to clarify the data and show that caffeine suppresses Sck1 an additional TORC1 phosphorylation target.

      Minor comments<br /> p3 'cigarette smoke and other gases'?

      We have edited the statement.

      P4 torin1 was dissolved in DMSO (not were)

      We have edited the text.

      p5 phospho not phosphor Ssp2

      We have edited the text.

      p6 exlpain why ppk18 deletion results are surprising. Also this result could be discussed.

      It had been proposed previously, that Ppk18 is the Greatwall homologue in S. pombe and thus the major regulator of PP2A and mitosis downstream of TOCR1. Later studies suggested a redundant role for Cek1 in this pathway. While deletion of cek1 in a ppk18 background modulated the effect of torin1 on cell cycle progression, it did not interfere with the effects of caffeine. At present we cannot account for this observation. We cannot rule out that caffeine activates an additional kinase that regulates Igo1 activity.

      Together our data show that caffeine advances progression into mitosis in a manner that differs from direct inhibition of TORC1 by torin1.

      We have now added the relevant comments on this unexpected observation within the discussion.

      Explain why Cek1 is not tested

      We have now tested a ppk18 cek1 double mutant.

      p6 introduce what pap1 is when first mentioned

      We have introduced PP2APab1 as requested.

      Reviewer #3 (Significance):

      The data show that fission yeast Ssp1/2 has a role in inhibiting TORC1 in response to caffeine and this influences checkpoint override. This is an incremental, but potentially interesting, observation contributing to understanding mechanism(s) of caffeine action. The lack of quantification, the pleiotropic nature of the mutants used and the rather blunt endpoints assayed make it hard to establish to what extent the direct TORC1 inhibition by Ssp2 causes the checkpoint override, which limits is potential impact. The core observation may, however, be of interest to the wider caffeine field. The referee has the perspective of a yeast cell cycle geneticist.

      We thank the reviewer for identifying the significance of the study in understanding the mechanisms of caffeine effects on the cell cycle. We have added all the suggested experiments with additional mutants and protein markers as well quantitative approaches that have appropriately improved the manuscript. We believe that the mechanism provided is of more general interest and not limited to the caffeine field: manipulating the cell cycle and understanding the interplays between growth and stress are of general interest and importance.

      Reviewer #4 (Evidence, reproducibility and clarity):

      The authors provide a series of genetic studies identifying a role for Ssp1-Ssp2 signaling in TORC1-dependent responses to DNA damage. The main assays are cell division (i.e. septation index) and cell viability (i.e. serial dilution spot assays) following treatment with the DNA damaging agent phleomycin. The authors perform these assays in a number of genetic mutant backgrounds to determine which genes and pathways are required for the relevant cellular response. Supporting data also include microscopy images and western blots to test protein phosphorylation. In general, the results support a role for Ssp1-Ssp2 acting upstream of TORC1. However, in several cases the data do not support a straightforward relationship, and it is confusing to parse through a number of intermediate effects, which often vary between different assays. I have provided some specific comments below that might be addressed to strengthen the technical aspects of the manuscript.

      Major<br /> 1. The authors conclude "that caffeine and torin1 indirectly and directly inhibit TORC1 activity respectively" based on Figure 1. This conclusion seems quite strong given the indirect nature of assays in Figure 1, which test septation in the presence of DNA damage. The conclusion would require experiments that assay TORC1 activity itself.

      Both caffeine and torin1 have previously been reported to inhibit TORC1 which controls the timing of mitosis. We sought to investigate if caffeine mediates its effects via the stress response pathway. We have conducted additional experiments which clearly demonstrate that caffeine inhibits TORC1 at least partially via the activation of Ssp2. These observations make sense as we have previously shown that caffeine actives the stress response pathway to activate Srk1 which inhibits Cdc25. More recent studies my others indicate that Ssp1 is required to suppress Srk1 to allow progression into mitosis. This accounts for the failure of ssp1 mutants to advance mitosis under stress conditions. Additionally, Ssp1 activates Ssp2 which leads to the downstream inhibition of TORC1.

      1. Figure 2 needs some explanation to introduce the idea that cell growth reflects an intact DNA damage response that prevented division in the presence of phleomycin. I also felt that the conclusions were very strong given the data, and the authors should discuss each case more carefully. For example, deletion of ssp1 does not really suppress the ability of torin1 to enhance phleo sensitivity (Figure 2C).

      We would not expect the deletion of ssp1 to suppress the effect of torin1 under stress conditions. We have provided further evidence to show that Ssp1 is required to facilitate progression into mitosis at least in the presence of phleomycin or heat stress.

      1. Microscopy imaging in Figure 3 nicely complements some of the other assays. However, it seems important to know if the cells are actively growing in each of these cases. An example is torin and rapamycin shortening ssp1 mutants at 35 degrees: are these cells actively cycling?

      Our aim was to demonstrate that caffeine exacerbates the ssp1 phenotype. This would provide further evidence to show that caffeine exerts its effects at least in part by activating Ssp1. Cells do not cycle in the presence of torin1 as it inhibits both TORC complexes. We have provided additional evidence to show that caffeine does indeed interact with Ssp1. As the primary aim of the study was to determine is caffeine overrides DNA damage via Ssp1 we have not investigated if they are cycling. Their shortened size suggests that rapamycin and torin1 affect cell division in a different manner from caffeine.

      1. From Figure 6A, the authors conclude that caffeine induces phosphorylation of Ssp2. However, it appears that both Ssp2 protein levels and its phosphorylation levels are both increased, which seems an important distinction.

      We have repeated these experiments several times under different conditions. Some proteins become more stable when phosphorylated as has been previously demonstrated for Srk1 for instance.

      1. In Figure 6D, the authors should show separate gsk3 and ssp1 mutants. It seems likely that all phosphorylation of Ssp2 is due to Ssp1, but this should be shown.

      We have replaced the figure with a ssp1 single mutant.

      1. I am confused about Maf1 phosphorylation in Figure 6C. It is increased upon torin1 treatment, but it is discussed as an indicator or TORC1 activity. Does that mean that loss of its phosphorylation correlates with increased TORC1 activity? As written, I thought it was a TORC1 substrate, which led to confusion about its increased phosphorylation upon torin1 treatment.

      Maf1 is phosphorylated by TORC1. Inhibition of TORC1 would thus lead to a loss of phospho-Maf1 moieties and the accumulation of the unphosphorylated form. We have conducted additional experiments and under various conditions to show that caffeine weakly inhibits Maf1 phosphorylation. We note however, that different stresses result in differential outcomes following TORC1 inhibition. As such we have included new data to show that caffeine suppresses the TORC1 target Sck1. In S. pombe Sck1 and Sck2 regulate progression into mitosis.

      Minor<br /> 1. An untreated control should be shown for assays in Figure 1.

      We have included this data for figure 1a.

      1. An untreated control should be shown for assays in Figure 4.

      We have noted in the results for figure 1, that untreated cells and phleomycin only treated cells do not show any changes in septation index over the time course studied in these experiments.

      Reviewer #4 (Significance):

      The study has significance in connecting several conserved and central signaling pathways including TORC1, AMPK, and PP2A. Also, the study uses caffeine and torin1 that have effects in many different cell types. The connection between caffeine and torin1 effects on phleomycin-treated cells was previously established by these researchers. The significance of the current study is providing a genetic pathway for this connection. The significance is partly limited by some of the technical points raised in the previous section, such as some inconsistencies in the strength of results from different assays. Also, the role of these pathways in DNA damage response signaling is not new. While the main significance of this work might relate to a more specialized audience, it does add to a broader body of literature regarding these conserved pathways and processes.

      My expertise is yeast cell biology.

      While the roles of the pathways in DNA damage has been reported usinbg genetic and pharmacological combinations we dissect their relationships and provide mechanistic connections.

      We thank the reviewer for identifying the significance of this study. We believe we have now addressed the technical issues raised.

    1. Reviewer #3 (Public Review):

      This paper proposes a computational account for the phenomenon of pattern differentiation (i.e., items having distinct neural representations when they are similar). The computational model relies on a learning mechanism of the nonmonotonic plasticity hypothesis, fast learning rate and inhibitory oscillations. The relatively simple architecture of the model makes its dynamics accessible to the human mind. Furthermore, using similar model parameters, this model produces simulated data consistent with empirical data of pattern differentiation. The authors also provide insightful discussion on the factors contributing to differentiation as opposed to integration. The authors may consider the following to further strengthen this paper:

      The model compares different levels of overlap at the hidden layer and reveals that partial overlap seems necessary to lead to differentiation. While I understand this approach from the perspective of modeling, I have concerns about whether this is how the human brain achieves differentiation. Specifically, if we view the hidden layer activation as a conjunctive representation of a pair that is the outcome of encoding, differentiation should precede the formation of the hidden layer activation pattern of the second pair. Instead, the model assumes such pattern already exists before differentiation. Maybe the authors indeed argue that mechanistically differentiation follows initial encoding that does not consider similarity with other memory traces?

      Related to the point above, because the simulation setup is different from how differentiation actually occurs, I wonder how valid the prediction of asymmetric reconfiguration of hidden layer connectivity pattern is.

      Although as the authors mentioned, there haven't been formal empirical tests of the relationship between learning speed and differentiation/integration, I am also wondering to what degree the prediction of fast learning being necessary for differentiation is consistent with current data. According to Figure 6, the learning rates lead to differentiation in the 2/6 condition achieved differentiation after just one-shot most of the time. On the other hand, For example, Guo et al (2021) showed that humans may need a few blocks of training and test to start showing differentiation.

      Related to the point above, the high learning rate prediction also seems to be at odds with the finding that the cortex, which has slow learning (according to the theory of complementary learning systems), also shows differentiation in Wammes et al (2022).

      More details about the learning dynamics would be helpful. For example, equation(s) showing how activation, learning rate and the NMPH function work together to change the weight of connections may be added. Without the information, it is unclear how each connection changes its value after each time point.

      In the simulation, the NMPH function has two turning points. I wonder if that is necessary. On the right side of the function, strong activation leads to strengthening of the connectivity, which I assume will lead to stronger activation on the next time point. The model has an upper limit of connection strength to prevent connection from strengthening too much. The same idea can be applied to the left side of the function: instead of having two turning points, it can be a linear function such that low activation keeps weakening connection until the lower limit is reached. This way the NMPH function can take a simpler form (e.g., two line-segments if you think the weakening and strengthening take different rates) and may still simulate the data.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)):____

      Summary: In this manuscript by Berg et al the authors demonstrate that RNA polymerase activity is important for the formation of nuclear blebs. This is an interesting and significant finding because prior work has suggested nuclear bleb formation is a result of changes in nuclear rigidity (lamins) or chromatin (via histone modifications). Overall I thought the manuscript was quite interesting and the data well presented. I think the inclusion of multiple mechanisms of blebbing (VPA treatment, as well as lamin B KO) helps to further support the importance of RNA polymerase/transcription activity in the blebbing process. However, I do have some concerns regarding the conclusions of the data that I think should be addressed as a revision.__

      We appreciate that Reviewer states that “the manuscript was quite interesting and the data well presented”, it is a “significant advancement”, and “the first report of this phenomena, and thus will be impactful to the nuclear mechanics field.”

      In the points below, the Reviewer specifically suggests that we: 1) clarify possible contributions from RNA pol III, 2) address how global vs. local chromatin motion might contribute to our findings, and 3) discuss the force production capabilities of RNA pol II. We also appreciate the feedback regarding the conclusions and have made the specific changes requested in the revision.

      Major Comments:____ 1. One concern I have is that the alpha-amanitin inhibitor has been shown to also inhibit RNA polymerase III. In an old study (1974 Weinmann PNAS) it appears that the inhibitor starting at 1 to 10 ug/ml. In this study the authors are using 10 uM alpha-amanitin, which is ~ 9 ug/ml and within the range of inhibiting some RNA polymerase III. Additionally, the other drug (actinomycin D) is even less specific for RNA polymerase II. I would suggest that the authors consider one of the following approaches 1) acknowledge in the manuscript the potential for RNA polymerase III to be important in the blebbing process 2) try a 10-fold lower dose of alpha-amanitin and see if that also inhibits blebbing, 3) try to find a way to demonstrate that RNA polymerase III activity is not inhibited at the 10 uM alpha-amanitin dosage, or 4) consider an alternate method to perturb RNA polymerase II activity (see Zhang Science Advances 2021 for an auxin-based approach to downregulate RNA polymerase II).

      The Reviewer raises the point that alpha-amanitin inhibits both RNA pol II and III. In the revised manuscript, we provide new data to further support that the observed effects arise from RNA pol II. We now include new data from cells treated with the transcription inhibitors flavopiridol (which inhibits RNA pol II elongation) and triptolide (which inhibits RNA pol I and II initiation). These transcription inhibitors also suppress nuclear blebbing in VPA-treated nuclei (Figure 2C) as well as three other nuclear blebbing perturbations in chromatin and lamins (Supplemental Figure 1A). These new experiments directly show that nuclear bleb suppression by transcription inhibitors can be observed without possible inhibition of RNA pol III by alpha-amanitin.

      __ A second concern I have is that the inhibition of RNA polymerase is global. Thus it is difficult to know for sure the biophysical function of the polymerase occurs immediately at the bleb, or instead is somehow affecting the overall chromatin state throughout the entire nucleus. I agree that figure 3 does provide some evidence that major mechanical and biophysical properties of the nuclei are not changed in response to the inhibition of the polymerase. However, micromanipulation experiments are done with isolated nuclei, which may be somehow mechanically altered already by isolation from cells. I feel that there still must be given some consideration in the discussion of the possibility that RNA polymerase activity outside of the bleb may be having some role in the stabilization of the chromatin and blebbing propensity.__

      We appreciate the Reviewer’s insightful comments and we have revised the manuscript to clarify that we do not attribute blebbing purely to local effects. Instead, we argue that global changes in chromatin motion driven by transcription could contribute to nuclear blebs.

      We did not intend to communicate that alterations to chromatin or its dynamics were necessarily only local. Indeed, we found that relative levels in RNAP Ser2 and Ser5 phosphorylation were different inside the blebs (Figure 6). Nonetheless, transcription was perturbed globally in our experiments, so we realized that blebbing could be driven by global changes (Figure 1). We hypothesize that global regulation of transcription can stimulate nuclear blebbing since transcription and its inhibition can, respectively, drive and suppress correlated chromatin motion throughout the entire nucleus (as previously observed by Zidovska et al. (PNAS 2013) and Shaban et al. (NAR 2018, Genome Biol. 2020), among others). We have revised the manuscript to clarify this point (Discussion section, page 15). We have also added new simulation snapshots showing global chromatin motions and how these motions are coupled to nuclear morphology (Figure 7C).

      In response to the concern that isolated nuclei exhibit different mechanical properties than nuclei inside of cells, we refer to our previously published micromanipulation measurements (Stephens et al. MBoC 2017). There, we found that nuclei within the cell and outside of the cell have quantitatively similar spring constants and qualitatively similar force-extension curves. Therefore, we are confident that the lack of change in nuclear stiffness measured by micromanipulation accurately reflects the mechanics of nuclei inside of cells across different perturbations.

      __ While I lack expertise to evaluate the basis of the model, I appreciate the model can show that motor activity can influence bulge. But it is not clear in the manuscript that RNA polymerase can generate these kinds of forces. The Liu citation is a model, and does not provide direct evidence that the RNA polymerase can generate force, or forces large enough to be meaningful. To me the model in this paper (Figure 7) felt as if it was only a possible hypothesis of why the RNA polymerase has an effect on blebbing, but I imagine there could be other hypotheses that would cause the same effect. The authors state (in the abstract) that RNA pol II can generate active forces, but I am concerned this is not sufficiently established. Since this motor/force activity of RNA polymerase is not experimentally demonstrated in this paper the authors should either do a better job of including evidence of this from the literature or consider removing this part of the manuscript.__

      RNA polymerase is capable of exerting forces in excess of 10 pN (e.g., see Wang et al. Science 1998; Herbert et al., Annu Rev Biochem 2008). The collective activity of many motors (10’s of thousands, e.g., see Zhao et al. Proc. Natl. Acad. Sci. 2014) may generate even larger forces. As discussed in our earlier modeling paper, this force scale is consistent with the motor strengths studied in our simulations (Liu et al. Phys. Rev. Lett. 2021); in the present work, we present simulation results for motors that generate 0.14 pN forces. Thus, transcription, in principle, could generate forces even larger than the ones we considered in the model.

      Additional experiments indicate that at larger length scales, RNA polymerase activity appears to drive coherent motions of chromatin throughout the cell nucleus (Zidovska et al. PNAS 2013; Shaban et al. NAR 2018; Shaban et al. Genome Biol 2020). It is these motions, driven by motors, that appear to drive the formation of nuclear bulges in our model (please see new panel Figure 7C).

      Therefore, the aim of the model is to build on established and new results to better understand how transcription could alter nuclear morphology. Our model is adapted from earlier models, which could reproduce observations of chromatin-based nuclear rigidity, (Stephens et al. MBoC 2017, Banigan et al. Biophys J 2017, Strom et al. eLife 2021), some aspects of nuclear morphology (Banigan et al. Biophys J 2017, Lionetti et al. Biophys J 2020), and possibly explain how nonequilibrium motor activity (such as RNA pol II) can drive coherent chromatin dynamics (Liu et al. PRL 2021), which have been observed in live-cell imaging experiments (e.g., Zidovska et al. PNAS 2013; Shaban et al. NAR 2018; Shaban et al. Genome Biol. 2020, among others). The precise form of the motor activity is not the focus of our model (or the previous motor model in Liu et al. PRL 2021). Instead, our simulation result indicates that the relatively small motor forces that generate coherent chromatin dynamics could explain the surprising observation that transcription is a critical component of nuclear blebbing.

      To address the Reviewer’s comment, we have added additional text to the Introduction and the Results sections to support the inclusion of motors to model the possible effects of transcription on chromatin dynamics and nuclear shape.

      In the Introduction (page 4), we now write:

      Simulations suggest that chromatin connectivity combined with the forces generated by polymerase motor activity (~10 pN per polymerase (Herbert et al. 2008)) could generate these dynamics (Liu et al., 2021).

      In the Results section (page 10), we write:

      We consider motors that generate sub-pN forces, well below the 10 pN forces that may be generated by individual RNA polymerases (Herbert et al. 2008).

      Additionally, we have updated Table 1 to include the simulated motor strength.__ __

      __ Minor Comments: 1. Did the authors do any analysis to see if the increased RNA transcription with VPA treatment (Figure 1B) has any spatial relationship to where the bleb occurs? Could an analysis of this be done similar to Figure 6 (with a bleb/body ratio)?__

      The Reviewer raises an interesting point about measuring RNA localization relative to the bleb. We measured RNA intensity in the bleb and the nuclear body for wild type cells only. We find that RNA levels are significantly decreased in the bleb (80% of body signal, p

      __ Is there anything known about lamin B1 KO cells as to whether or not they have increased transcription? Or could the authors do an analysis like they did with VPA treatment to check this?____ If they were to have increased transcription this would further support the authors' proposed mechanism of transcription itself (or RNA polymerase activity) driving blebbing).__

      In the revised manuscript, we show that several nuclear perturbations that are known to decrease nuclear stiffness and cause increased nuclear blebbing also rely on active transcription. Lamin B1 knockout or knockdown cells have been shown to result in changes in transcription. However, it was difficult to find data that shows whether the overall level of transcription changes. Collaborators of ours have unpublished data that indicates that twice as many genes are upregulated as downregulated upon lamin B1 knockdown, but this still does not assess the total level of transcription within the nucleus. Alternatively, increasing transcription via other means is fraught with off-target effects, which would require many additional complementary experiments. We thank the Reviewer for this interesting suggestion, but we believe this is beyond the scope of this manuscript, in which we have focused on showing that transcription inhibition suppresses bleb formation.

      __ Figure 1D, the VPA ser2 image appears much brighter than the untreated image. Yet the graph shows they are similar. Perhaps a more representative image should be used?__

      The image used reflects the data that Ser2 signal is brighter (by ~10%) in VPA-treated cells but is not significantly altered compared to wild type (unt), and thus it is an accurate reflection of the data.

      __ Can the authors comment if there is less DNA at the bleb site? In Figure 6 A this appears to be the case (based on the VPA image). If true, is the alpha-amanitin treatment rescuing this such that there is more DNA at the bleb (maybe causing the bleb to be smaller?).__

      We find that there is less DNA signal intensity per unit area in the nuclear bleb as compared to the nuclear body (bleb has ~60% the signal of the body; see teal dots/data in Figure 6B). This agrees with previously published work from our lab (Stephens et al. 2018 MBoC).

      Alpha-amanitin treatment does not rescue this effect. Decreased DNA enrichment in the bleb remains with alpha-amanitin treatment (p > 0.05, comparing across all 4 conditions in Figure 6B).

      __ What is the significance of bleb vs non-bleb nuclear rupture? Is there anything known in the literature as to how these ruptures may be different in terms of biophysics, impact to DNA, repair? It would be helpful to have some context, as well as to understand if non-bleb rupture is something that may have been previously missed in other contexts.__

      The Reviewer asks a valid and interesting question that this manuscript only begins to address. In general, we believe that ruptures occurring with blebs vs. without blebs may reflect aspects of the underlying mechanism(s) of blebbing and rupture, in the presence or absence of transcription. We offer a few further thoughts below.

      1) Non-bleb nuclear ruptures have been reported in a few papers by our group (Stephens et al., 2019 MBoC) and others (Chen et al., 2018 PNAS), but much is still unknown.

      2) Non-bleb nuclear rupture is part of normal nuclear behavior, as it accounts for ~20% of nuclear ruptures in wild type and perturbed cells (VPA and LMNB1-/-).

      3) Overall, we think that bleb-based and non-bleb-based ruptures may occur through different mechanisms. The simplest difference is that bleb-based nuclear ruptures follow the nucleus’ ability to form blebs, whereas non-bleb-based nuclear rupture occurs in cases where there is less bleb formation, suggesting that factors other than the ability to form blebs may also be important for rupture. In the current study, we observed that bleb-based nuclear ruptures (and bleb formation) require transcription. In another manuscript from our lab under review, bleb-based nuclear ruptures (and nuclear blebbing) can be suppressed by actin contraction inhibition and increased by increased actin contraction (Pho et al., biorxiv 2022).

      Additionally, we note it was reported that non-bleb-based nuclear ruptures, at least some of which are driven by microtubule prodding, result in increased levels of DNA damage (Earle et al. Nat Mater 2020), as has been observed for bleb-based ruptures (Stephens et al., 2019 MBoC; Xia et al. J Cell Bio 2018). Thus, nuclear rupture in general is thought to lead to DNA damage. However, total levels of DNA damage due to rupture may be controlled by different cellular processes.

      In the revision, we have clarified our motivation for quantifying ruptures with and without blebs. We have also added a few remarks, drawn from the above comments, to the Discussion section (pages 11-14).

      Reviewer #1 (Significance (Required)):____ General assessment: This study is a careful analysis of how RNA polymerase inhibition reduces nuclear blebbing. The study demonstrates this very well, using a variety of approaches. However, some limitations are the overstatement of some conclusions (specifically that it is RNA polymerase II when the inhibitor may also affect RNA polymerase III; that the RNA polymerase activity is important at the bleb and involves motor activity). Advance: This paper is a significant advancement because it shows the role of transcription in the biophysics of the nuclear shape. To my knowledge this is the first report of this phenomena, and thus will be impactful to the nuclear mechanics field. Audience: I think the findings are of broad interest, including beyond the nuclear mechanics field. I think the audience would be the entire cell biology community. Expertise: My expertise is in cell mechanics, including forces at the the nuclear LINC complex. While I do not work in the field of nuclear blebbing and rupture, I follow this field quite closely.

      We greatly appreciate the Reviewer’s statement that “To my knowledge this is the first report of this phenomena, and thus will be impactful to the nuclear mechanics field.__” __We thank the Reviewer for their thoughtful comments and suggestions, which have helped to improve the manuscript. __

      __

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors present data supporting the potential involvement of active transcription in the formation of nuclear blebs when the global deacetylase inhibitor valproic acid (VPA) has been applied to cells

      Reviewer #2’s greatest concern throughout the review was that we focused on the use of VPA as a model for generating increased nuclear blebbing and 24-hour treatment with alpha-amanitin as a transcription inhibitor. In the revised manuscript, we provide new data to show that nuclear blebbing generated by a variety of different nuclear perturbations (VPA, DZNep, LMNB1-/-, and LA KD Figure 2D __and __Supplemental Figure 1A) is reliant on active transcription in two different cell lines (MEF and HT1080, Figure 2 A and B). This is supported by use of four different transcription inhibition drugs, which work over varying time periods (24 hrs in alpha-amanitin, triptolide, or flavopiridol; actinomycin D for 1.5 hrs Figure 2C). We also timelapse imaged during drug treatment to show that transcription inhibitors for which we used 24-hour incubation times, can suppress nuclear blebs within 8 hours (Supplemental Figure 1B). __We also show that nuclear bleb formation and stability in wild type is transcription dependent (__Figure 5). We believe the new data added in our revised manuscript addresses the concerns of the Reviewer that the findings were specific to VPA and alpha-amanitin together only.

      __Reviewer #2 (Significance (Required)):____

      The authors present data supporting the potential involvement of active transcription in the formation of nuclear blebs when the global deacetylase inhibitor valproic acid (VPA) has been applied to cells. __

      While somewhat interesting, this is a rather specific condition that is further restricted by the limited use of experimental approaches. For example, the only deacetylase inhibitor used is VPA. Is this because VPA is the only one to trigger the effect? The authors should expand their approach to include additional inhibitors or, preferably, a directed knockdown tactic that targets the specific HDACs driving their phenomena.

      The Reviewer is concerned that we have used limited experimental approaches by focusing on VPA treatment to induce nuclear blebs and alpha-amanitin overnight treatment to suppress nuclear blebbing. VPA treatment is a well-established perturbation to induce nuclear blebbing via HDAC inhibition, and it is similar to a variety of other nuclear perturbations that also induce blebs (Stephens et al. MBoC 2018, 2019; Kalinin et al. MBoC 2021; Pho et al. biorxiv 2022).

      Nonetheless, to clearly address the Reviewer’s concerns we have provided new data which shows that four different nuclear perturbations are suppressed by transcription inhibition and that four different transcription inhibitors suppress nuclear blebbing. In addition to these perturbations, we also note that transcription inhibition affects bleb formation and stability in wild type cells. Below we outline the diverse experimental approaches that support the major conclusion of our manuscript.

      Our data shows that transcription inhibition suppresses nuclear blebbing through data for:

      1. Multiple cell lines (MEF and HT1080, Figure 2, A and B) – original data
      2. Multiple transcription inhibitors (Figure 2C __and Supplemental Figure 1__):
      3. Alpha-amanitin (RNA pol II and III degradation) – original data
      4. Triptolide (RNA pol I and II initiation inhibition) – new data
      5. Flavopiridol (RNA pol II elongation inhibition) – new data
      6. Actinomycin D (DNA intercalation) – original data

      7. Multiple perturbations that cause nuclear blebbing (Figure 2D ____and Supplemental Figure 1):

      8. VPA histone deacetylase inhibitor, which increases euchromatin and chromatin decompaction; used because it is the most highly studied treatment by our lab (Stephens et al., 2017, 2018, 2019 MBoC; Pho et al., 2022 biorxiv) – original data
      9. DZNep histone methyltransferase inhibitor, which decreases heterochromatin and chromatin decompaction (Stephens et al., 2018, 2019 MBoC) – new data
      10. Lamin B1 null cells (LMNB1-/- or LB1-/-) (many previous works, including Stephens et al. MBoC 2018) – original data
      11. Lamin A constitutive knockdown cells (LA KD) (Vahabikashi et al., 2022 PNAS) – new data

      12. Nuclear bleb formation and stabilization in wild type cells is dependent on transcription in addition to VPA (Figure 5). – original data

      13. Time dependence of suppression of nuclear blebbing requested by Reviewers 2 & 3:
      14. Actinomycin D treatment of 1.5 hrs is sufficient to suppress nuclear blebs (Figure 2C) – original data
      15. Transcription inhibition with alpha-amanitin, triptolide, and flavopiridol all show an increased rate of nuclear bleb reabsorption in the first 8 hrs of treatment for both VPA and LMNB1-/- perturbations (Supplemental Figure 1B) – new data.
      16. This new data indicates that even formed blebs require active transcription to remain blebbed for long times
      17. This new data also shows that the effect of transcription inhibition on nuclear blebbing does not require 24 hours of treatment.

      __Moreover, the authors imply that VPA works through histone deacetylation yet do not provide direct evidence. It is equally likely that the application of VPA alters the acetylation pattern of a non-histone protein that eventually alters nuclear blebbing. __

      The Reviewer questions whether histone deacetylation due to VPA treatment is responsible for nuclear blebbing. As the Reviewer notes in their next point below, histone deacetylation (e.g., by VPA or TSA treatment) as a mechanism for nuclear blebbing was previously established by work from our lab (Stephens et al., 2018 and 2019 MBoC) and others (Kalinin et al. MBoC 2021). This was described and referenced in the original manuscript’s introduction.

      To summarize previous work, inhibition of histone deacetylation by VPA induces chromatin decompaction (Stypula-Cyrus et al. PLoS One 2013, Lleres et al. J Cell Bio 2009), increasing histone acetylation/euchromatin (Göttlicher et al. EMBO J 2001; Krämer et al. EMBO J 2003). In turn, this softens the nucleus (Stephens et al. MBoC 2017; Shimamoto et al. MBoC 2017), which succumbs to nuclear blebbing (Stephens et al., MBoC 2018). Softening and blebbing effect can also be induced by histone hyperacetylation via TSA or histone demethylation via DZNep (Stephens et al., MBoC 2018). This effect can be reversed by chromatin compaction via increased histone methylation/heterochromatin formation (Stephens et al. MBoC 2019).

      In the present work, we measured histone acetylation (H3K9ac) in both VPA and VPA+alpha-amanitin perturbations to ensure that alpha-amanitin does not simply reverse the increase in VPA-based histone acetylation and thereby decrease nuclear blebbing, which it does not (Figure 3, A and B).

      Altogether, inhibition of histone deacetylation by VPA as a mechanism for nuclear blebbing is established by the previous literature. The present work builds on those results to uncover a surprising new driver of nuclear blebbing which is transcirption. Therefore, we consider it to be unnecessary to provide further confirmatory measurements of VPA-treated cells beyond what is already provided in the manuscript. Finally, we point to the inclusion of new data from three other nuclear perturbations that cause nuclear blebbing that can be suppressed by transcription inhibition (Figure 2).

      __Regardless, the reported findings with VPA were previously reported (Stephens et al. 2018) and the influence of alpha amanitin only represents an incremental advancement in our understanding of nuclear blebs. __

      The finding that alpha-amanitin inhibits nuclear blebbing implies that a previously unknown mechanism/pathway, involving an essential genomic process, is critical to nuclear shape regulation. We therefore strongly disagree with the Reviewer that bleb inhibition upon alpha-amanitin treatment represents an incremental advance.

      Moreover, the existing literature generally argues that nuclear blebbing is caused by actin-based compression and confinement. It is widely believed that the cytoskeleton deforms the nucleus, which can herniate a nuclear bleb in softer nuclei. Here, we show that with transcription inhibition there are no overt changes to actin contraction (Supplemental Figure 2), actin confinement (Figure 3E), and nuclear mechanics (Figure 3G). However, levels of blebbing change anyway! This will be a new and surprising result to those who believe the current prevailing narrative from the literature. We have now shown for the first time that transcription is also needed to form and stabilize nuclear blebs; to our knowledge, this was almost entirely unknown until now.

      Further supporting our belief in the significance of our findings, Reviewer #1 and Reviewer #3 clearly state that our work is novel and important:

      Reviewer #1 “To my knowledge this is the first report of this phenomena, and thus will be impactful to the nuclear mechanics field.”

      Reviewer #3 “This is an interesting study that shows, for the first time, that inhibition of transcription reduces the occurrence of nuclear blebs in cells that have been pre-treated with valproic acid.”

      To address the Reviewer’s concern, we have revised the manuscript to clarify that active transcription is required to form nuclear blebs across all of the perturbations now presented in this manuscript. Furthermore, we have clarified that transcription inhibition appears to suppress blebbing without altering other cellular components and properties (actin, nuclear stiffness) that are widely believed to control blebbing (see Results page 7, Results page 10, Discussion page 14).

      Adding to the concern is that actinomycin D does not have the same level of influence as alpha amanitin (Figure 2), which suggests the alpha amanitin is having a pleotropic impact on blebbing. To validate that the changes in blebbing in the presence of VPA are dependent upon active transcription, the authors should use the anchor-away technique to remove RNAP from the nucleus thereby avoiding any indirect effects of the drugs (i.e., alpha amanitin) in use. Further adding concern that it is an indirect outcome is the prolonged incubation period (16-24 hours) that is apparently needed to observe the changes (page 5 paragraph 4). If it is active transcription that is causing the change in blebbing, then this should be apparent in a much shorter time frame (The Reviewer is worried about possible differences between transcription inhibitors actinomycin D and alpha amanitin. To further address these concerns in the revised manuscript, we now present new data for VPA without transcription inhibitor and VPA with transcription inhibition vy four different transcription inhibitors (__Figure 2C). Inhibitors include alpha-amanitin (RNA pol II degradation), triptolide (transcription initiation inhibition), flavopiridol (transcription elongation inhibition), and actinomycin D (DNA intercalation). All VPA plus transcription inhibitor treatments result in a significant decrease in nuclear blebbing relative to VPA treatment alone (p (p > 0.05, Figure 2C). Thus, there is no significant difference in the degree of nuclear blebbing suppression between the four different transcription inhibitors used.

      Furthermore, the Reviewer raises concerns about the time interval from the start of transcription inhibitor treatment to suppression of nuclear blebbing. We agree that considering this time interval is valuable. However, we need to consider that the time interval for each of the different transcription inhibitors to take effect is different (Bensaude 2011 Transcription). Alpha-amanitin inhibits transcription in 4-8 hours (10 µM, Nguyen et al., 1996 NAR), triptolide (1 µM, Chen et al. 2014 Genes Dev) and flavopiridol (0.5 µM, Chen et al., 2005 Blood) work in 2-4 hours, and actinomycin D works in about 1 hour (10 mg/mL, Lai et al. 2019 Methods). These times are now mentioned in the manuscript (Figure 2 legend and Methods section).

      It was not, however, known in advance how long it would take for transcription inhibition to have an effect on nuclear morphology. Therefore, the time to observe bleb suppression could have been longer than these treatment durations. As mentioned above, treatment with actinomycin D for 1.5 hours results in a similar decrease in nuclear blebbing as compared to the other inhibitors with 24-hour treatment (Figure 2C). To further address these concerns, we provide new data in the revised manuscript showing tracking of nuclear bleb reabsorption during the first 8 hours of treatment with alpha amanitin, triptolide, and flavopiridol via live cell imaging. Nuclear bleb reabsorption for both VPA and LMNB1-/- perturbations goes from ~5 % to 30% or greater during the first 8 hours of treatment with each of the transcription inhibitors (Supplemental Figure 1B), consistent with the time required to fully inhibit transcription. This supports our conclusion that transcription is essential to stabilizing nuclear blebs.

      __In addition to these issues, the authors rely on immunofluorescence signals to measure the levels of various factors including the Ser5 and Ser2 phosphorylation, which is capturing the total levels of these factors and not the DNA bound forms. If the changes in blebbing actually involve transcription initiation, then the authors should include measurements on the DNA-bound factors. __

      We are measuring Ser5 and Ser2 phosphorylation of RNA polymerase to track the actively DNA transcribing population. These markers appear on DNA-bound RNAP. Ser5 and Ser7 of RNAP are phosphorylated during initiation, and subsequently dephosphorylated during transcription elongation, while Ser2 is added at that time (Hsin and Manley 2012 Genes Dev). Ser2 is removed at transcription termination. Therefore, we expect immunofluorescence to measure DNA-bound RNAP.

      __As reported the authors conclude that there is no changes in Ser2 and Ser5 phosphorylation yet they report that total RNA levels rise (Figure 1). How is the disconnect between RNA levels and Ser2 and Ser5 phosphorylation occurring? __

      The Reviewer raises a question about how VPA treatment increases RNA levels but not levels of active RNA pol Ser2 and Ser5. While this is an interesting question, without a dedicated investigation, we can only speculate, at best; this question is beyond the scope of the paper focused on how transcription inhibition suppresses nuclear blebbing. The point of this data is to show that treatment with alpha-amanitin alone and along with VPA causes decreases in both RNA and RNA pol II Ser2 and 5 confirming transcription inhibition.

      __Comparably, they use H3K9ac immunofluorescence as a measure of euchromatin. While the authors might be gaining a view on the total levels of H3K9ac under these experimental conditions, it is not clear whether this is DNA associated or not. Minimally, the authors should perform ATAC-Seq to judge the changes in euchromatin. __

      The Reviewer questions the use of H3K9ac immunofluorescence as measurement of euchromatin levels, particularly in VPA-treated cells. The relationship between VPA and chromatin decompaction / euchromatin levels has been previously established (e.g., Stypula-Cyrus et al. PLoS One 2013, Felisbino et al. J Cell Biochem 2014, Lleres et al. J Cell Bio 2009). New data in Figure 3B shows that heterochromatin marker H3K9me2,3 also is not altered by alpha-amanitin treatment. In the case VPA + alpha-amanitin treatment, micromanipulation and nuclear height measurements provide further evidence that chromatin decompaction remains, since chromatin-based force response is unchanged from VPA treatment alone (Figure 3, E and G).

      Again, we note that our manuscript focuses on the effects of transcription on nuclear blebbing and rupture, which were not previously reported and differ from the current understanding in the literature. Furthermore, ATAC-seq is a major undertaking that is simply not appropriate for further proving an auxiliary point about a previously established effect.

      In summary, the original manuscript addresses this point. The specific experiment requested by the Reviewer is not necessary and is far beyond the scope of this study.

      A final major concern is the lack of a correlation between the blebbing and nuclear ruptures (page 7 paragraph 3; Figure 4). If ruptures are not correlating with the blebbing, what is the relevance of the blebbing?

      The Reviewer is asking for a clarification of the importance of nuclear blebbing in relation to nuclear ruptures. We have revised the manuscript to add new text to the Figure 4 legend clarifying the measurements and to the Discussion section describing the importance of this data (Discussion pages 12-13 and page 14). We discuss this in more detail below.

      We would like to clarify that blebbing and nuclear rupture are not uncorrelated, as suggested by the Reviewer. We and others have shown that nuclear blebs are sites of high curvature that result in nuclear ruptures. In the present manuscript, timelapse imaging of nuclear bleb formation has been observed to result in nuclear rupture within minutes in all imaged cases (Figure 5). This data in the manuscript agrees with previous published data from our lab of bleb formation to rupture in >95% of the time (Stephens et al., 2019 MBoC). Furthermore, stabilized nuclear blebs persist for hours (Supplemental Figure 1B) and undergo more rupture, as shown in Figure 4D. Therefore, ruptures remain correlated with nuclear blebs in our study.

      What we have shown, however, is that the percentage of cells that undergo at least one nuclear rupture during the time lapse is not statistically significantly decreased from VPA-treated levels by the addition of alpha-amanitin (Figure 4B). This appears to be due to two factors: 1) a basal level of nuclear rupture (see wild type data in Figure 4) and 2) an increase in the level of non-bleb-based nuclear rupture. However, importantly, non-bleb-based ruptures appear to occur less frequently for cells that undergo nuclear ruptures. Of the cells that exhibit nuclear rupture, those with non-bleb-based ruptures on average undergo only a single rupture over a 3-hour timelapse whereas those undergoing bleb-based rupture undergo an average of > 2 ruptures over the same time (Figure 4D).

      Altogether, these data point to a correlation between blebbing and rupture, where blebbing can promote nuclear rupture, but is not essential for rupture. Therefore, observations of blebs are important in that they correspond to increases in nuclear rupture and corresponding nuclear dysfunction, such as DNA damage. The observation of non-bleb-based rupture, while not entirely a new (Chen et al. PNAS 2018, Stephens et al. MBoC 2019, Pho et al. bioRxiv 2022), is interesting because it may be driven by a different mechanism; transcription is not essential for nuclear ruptures in the absence of nuclear blebs but promotes rupture in the presence of blebs. These results add to our knowledge of the factors regulating nuclear integrity and shape, and we anticipate that they will be further investigated in future studies.

      Finally, beyond these findings, we speculate that blebbing itself may be harmful to cell nuclear function. Previous studies have observed that nuclear deformations can cause DNA damage (Shah et al. Curr Biol 2021), chromatin reorganization (Jacobson et al. BMC Biol 2018, Golloshi et al. EMBO J 2022), and alterations to mechanotransduction (reviewed in Kalukula et al. Nat Rev Mol Cell Biol 2022). The extent to which the changes associated with these “nuclear deformations” require blebbing, rupture, or both is under investigation by various labs. Furthermore, previous studies (Shimi et al. Genes Dev 2008; Pfleghaar et al. Nucleus 2015) along with the present study (RNA Pol Ser2 and Ser5; Figure 6) have shown that chromatin content and, possibly, functionality is different within the nuclear bleb. Data in another manuscript in preparation from our lab, further suggests that there is limited exchange of biomolecular content between the nuclear body and bleb. Therefore, while we cannot conclusively claim that blebs are themselves deleterious to function, there is a growing body of suggestive evidence that this is the case.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This is an interesting study that shows, for the first time, that inhibition of transcription reduces the occurrence of nuclear blebs in cells that have been pre-treated with valproic acid. The data that supports this is in Figure 2, collected in two different cell types (MEFs and HT1080 cells). The effect appears robust. New data is also provided that a marker of initiation of transcription but not transcriptional elongation is enriched in valproic acid-induced blebs.

      We thank Reviewer #3 for positive comments that our study is “interesting”, “reproducible”, and data that shows the effect of transcription on nuclear blebbing “for the first time”.

      This Reviewer asks for clarifications on 1) how transcription is a new mechanism for nuclear bleb formation and not part of the traditional view, 2) the generality of our conclusions (similar to Reviewer #2) since we report “on the inhibition of transient, small, valproic acid-induced blebs by alpha-amanitin”, and 3) the insight the modeling provides. We have provided new data and made changes to the manuscript to address all the Reviewer’s comments.

      __ Major comments

      1. The paper makes general claims about transcription and nuclear shape, when in reality, it is only reporting on the inhibition of transient, small, valproic acid-induced blebs by alpha-amanitin. This scenario under which the experiments were performed, for which there is no obvious physiological counterpart, ought not to be construed to challenge or contrast with the current understanding that the nucleus maintains its shape by resisting cytoskeletal forces. Cytoskeletal forces are well-known to establish nuclear shape; nuclear shape in this context, is generally taken to refer to the gross shape of the nucleus (e.g. elliptical, circular, etc.), and not small local blebs that may form due to F-actin based confinement or other mechanisms. Thus, this interpretation is overstated:

      "Surprisingly, we find that while nuclear stiffness largely controls nuclear rupture, it is not the sole determinant of nuclear shape. This contrasts with previous studies, which suggested that the nucleus maintains its shape by resisting cytoskeletal and/or other external antagonistic forces (Khatau et al., 2009; Le Berre et al., 2012; Hatch and Hetzer, 2016; Stephens et al., 2018; Earle 12 et al., 2020)."

      __

      The Reviewer appears to be concerned with two issues in this comment. First, the Reviewer is concerned about our use of the word shape, which could be interpreted too generally, rather than as categorizing the blebbing and rupture phenomena that we observe in this study. We appreciate the Reviewer’s feedback and have made changes to this sentence as well as the paper in general to clarify that we are focused on nuclear blebs. Second, there is the issue of to what degree our results modify our understanding of the role of nuclear stiffness in nuclear blebbing and rupture. We discuss this below.

      To address the Reviewer’s comment that the results are limited to “the inhibition of transient, small, valproic acid-induced blebs by alpha-amanitin” we provide new data and context for our results. The revised manuscript includes 1) new data using four transcription inhibitors and four nuclear blebbing perturbations and 2) original data showing that nuclear blebs are persistent rather than small and transient, and they alter gross nuclear shape. Our results are relevant to a wider range of blebbing/rupture and bleb/rupture suppression scenarios, as exemplified by the different nuclear perturbations, transcription inhibitors, cell types tested in our experiments, and long lifetimes for nuclear blebs. More specifically:

      1) The Reviewer notes that our original studies were done with VPA and alpha-amanitin, similar to Reviewer #2 concerns. We provide new data to now show that 4 different transcription inhibitors can suppress nuclear blebbing across 2 chromatin and 2 lamin perturbations (Figure 2 and Supplemental Figure 1). Thus, our new data supports the idea that transcription is broadly required for nuclear blebbing.

      2) The Reviewer states that blebs are small and transient, and that “shape” is meant to reflect the gross shape (e.g., circular). In fact blebs are long-lived as we show with new data that most (>95%) of VPA and LMNB1-/- blebs, remain at the end of an 8-hour timelapse (Supplemental Figure 1B). Furthermore, on average, nuclear blebs account for 15% of the nuclear size in VPA-treated cells (Figure 6E). While not measured in this paper, many studies have shown that nuclear blebs cause gross circularity to decrease significantly and that changes in circularity are associated with nuclear rupture (e.g., Stephens et al. MBoC 2018, Xia et al. JCB 2018). Most recently, we show nuclear blebs decreased nuclear circularity significantly in another manuscript under review (Pho et al., 2022 biorxiv).

      The Reviewer also argues that our data showing the importance of transcription in nuclear blebbing “ought not to be construed to challenge or contrast with the current understanding that the nucleus maintains its shape by resisting cytoskeletal forces.” We acknowledge that our results are not sufficient to rule out the broad assertion made by the Reviewer. However, our data shows for the first time that nuclear blebbing relies on transcriptional activity, while we measure no change in actin contraction or confinement or nuclear stiffness (respectively, Supplemental Figure 2 and Figure 3, C-E). Consequently, these results are a challenge to the current understanding, which must be updated by our results and future experiments. At the same time, we note that this manuscript’s Discussion section acknowledges that we have data in another preprint in which inhibition of actin contraction decreases nuclear blebbing to near 0% in wild type and perturbations (Pho et al., 2022 biorxiv). Together, these observations suggest a complicated picture in which multiple factors are jointly responsible for regulating nuclear blebbing and rupture.

      __ As an aside, the data in the paper does not appear to support the interpretation that "nuclear stiffness largely controls nuclear rupture". It is unclear what the authors mean by this statement.__

      We originally intended that comment to state the previous understanding in the literature, but we realize it was unclear. We appreciate the Reviewer’s feedback and have revised the text.__ __

      __ 2. Further to point 2, treatment with alpha-amanitin does nothing to the occurrence of blebbing in normal cells. Thus, the data are specifically applicable to valproic acid-treated cells. As such, the broad interpretations related to nuclear shape and mechanics should be tempered.__

      The Reviewer is concerned that we cannot support the claim that this effect is broad and general; these concerns are also raised by Reviewer #2. We have provided new data and highlight original data to support that this effect is in fact broad and general, and moreover, that the data supports a role for transcription in nuclear blebbing.

      We specifically address the Reviewer’s statement: “treatment with alpha-amanitin does nothing to the occurrence of blebbing in normal cells”. In the original manuscript, we provided data that showed that wild type nuclear bleb formation and stability are suppressed upon transcription inhibition (Figure 5) even though the percentage of wild type nuclei exhibiting a bleb is not changed by alpha-amanitin treatment (Figure 2). We also provided data showing that the predominant type of nuclear rupture changes with alpha-amanitin treatment, including in wild type cells (blebbed vs. not, Figure 4C). Thus, while the effects of transcription inhibition are most easily visible in VPA-treated cells, they are also present in wild type cells in how blebs are formed and stabilized (Figure 5). We have revised the manuscript to better highlight this important point.

      In addition, we again emphasize that our results extend beyond VPA-induced blebs. Our revised manuscript now includes new data of 4 different perturbations (to chromatin histone modifications and lamins A and B) that induce nuclear blebs, which can be suppressed by 4 different transcription inhibitors (Figure 2 and Supplemental Figure 1). As previously noted by both Reviewers 1 and 3, this effect is reproducible in different cell lines. This new data directly addresses the concern that the effect is only applicable to VPA and alpha amanitin.

      Nonetheless, we agree with the Reviewer that we cannot support broader claims that nuclear mechanical properties are unaltered by transcription inhibitors across all scenarios, as we only measured this change in VPA-treated cells. Micromanipulation force experiments are detailed and time consuming, making it difficult to include data for multiple perturbations. We chose VPA because we have the most measurements of this perturbation which have remained consistent over the life of micromanipulation force measurements. Therefore, we have revised our statements on nuclear mechanics in the revised manuscript (page 14).

      __ T____he motor model for RNA pol II activity assumes that the motor 'repels' nearby chromatin units. It is not clear how this is related to the mechanism of motor action of RNA pol II on chromatin during transcription.__

      The point of the model is not to precisely reproduce the manner in which transcribing RNA pol II exerts forces on the chromatin fiber. Instead, we have developed a coarse-grained model to study how the collective activity of molecular motors might drive chromatin dynamics and consequently, changes in nuclear shape, either global or local.

      The model itself is based on our earlier models, which were used to recapitulate and understand how changes to chromatin mechanical properties governed nuclear rigidity (Stephens et al. MBoC 2017, Banigan et al. Biophys J 2017, Strom et al. eLife 2021; also see a similar model by Lionetti et al. Biophys J 2020) and how nonequilibrium activity due to molecular motors, such as RNA pol II, can drive coherent chromatin dynamics (Liu et al. PRL 2021), which have been observed in live-cell imaging experiments (e.g., Zidovska et al. PNAS 2013; Shaban et al. NAR 2018; Shaban et al. Genome Biol. 2020, among others). The current model therefore explores how the newly observed connection between transcription and nuclear blebbing could be explained by known phenomena.

      We note that the "repelling” motors used to model RNA pol II activity in the present work are in many ways qualitatively similar to the dipolar “extensile” motors used by other researchers to model motor-driven chromatin dynamics (e.g., see Saintillan et al. PNAS 2018). More generally, study of “active matter” over the last 20-30 years (and statistical physics over the last century) has shown that precise details of active molecular agents are often unimportant to the larger-scale behavior of the system (e.g., see Marchetti et al. Rev Mod Phys 2013). Thus, we view the repulsive motors as modeling the effective behavior of many RNA pol II within a sub-micron region of chromatin. Better establishing the differences between different choices of motor activities is the subject of a modeling paper in preparation.

      To address the Reviewer’s concern, we have more clearly stated the scientific foundations of the model, and we have revised our description of the model to clarify that we do not intend to model the behavior of individual RNA pol II by individual repulsive motors (see Results section, page 10).

      __The motor model also does not seem to add conclusive insight to the manuscript, as the nuclear shapes predicted are not directly comparable to the experimental shapes which are flat and smooth with only an occasional, single, local bleb. __

      The Reviewer raises two related points with this comment: that bulges and blebs are not directly comparable, and therefore, that the model “does not seem to add conclusive insight to the manuscript.”

      We agree with the Reviewer that bulges in the simulations are not blebs as they are observed in the experiments. However, it seems likely to us that bulges are necessary precursors to bleb formation; it is difficult to envision how a large local nuclear protrusion could form without first bulging outward from the nuclear body. Furthermore, we disagree with the assertion that nuclei are generally flat and smooth, as qualitative and quantitative analysis of imaging data reveals that nuclei exhibit shape fluctuations and irregularities across multiple scales (see, for example, Chu et al. PNAS 2017, Patteson et al. JCB 2019, Stephens et al. MBoC 2019, Liu et al. PRL 2021).

      Nonetheless, the observation of bulges but not blebs is a shortcoming of the simulation model. We believe this shortcoming reflects a tradeoff made in developing this model; we chose to develop and study a model with relative simplicity compared to a real cell nucleus. A more complicated model might better capture some aspects of nuclear blebbing at the expense of additional complexity. For example, the current model does not allow lamin-lamin or chromatin-lamin bonds to rupture, either stochastically or due to high forces. This effect, which is likely present in vivo, might be necessary for generating more bleb-like structures in simulations. Developing and refining such a model is an active pursuit within our collaboration, but for the moment, it is beyond the present purpose of the model.

      Instead, the purpose of the model is to determine whether the observed effect of transcription inhibition on nuclear blebbing / localized shape deformations can be understood through known biophysical phenomena. Established models – to the extent that they exist – were insufficient because they typically relied on nuclear mechanics, which our experiments provide data that transcription is not changing nuclear mechanical rigidity. The current model demonstrates how motor activity within chromatin can alter the structure and dynamics of the lamina. The simulations are certainly not proof that transcription affects nuclear blebbing through the proposed mechanism. However, they are a first-of-their kind demonstration of how nonequilibrium biophysical activity (such as that generated by transcription) within a biopolymer system (chromatin) can emergently alter the geometry of the confining boundary (the lamina). This new result provides a plausible interpretation for the experiments in the manuscript.

      In the revised manuscript, we have clarified our modeling approach and objectives in the Results and Discussion sections, and we have more clearly identified and discussed the limitations of the model (Results pages 10-11, Discussion page 15).

      The model offers 'proof of principle', but is not capable of ruling out alternative mechanisms (such as nuclear pressurization by confinement, chromatin decompaction, or changes to osmotic pressure). It may be more appropriate to include the model in the discussion as opposed to presenting it as a new result that can be reliably interpreted through comparisons with experiment.

      We respectfully disagree with the suggestion to include the model in the Discussion section instead of the Results. As discussed above, the model is new biophysics research and the simulations produced new scientific results, even if the overall interpretation remains open.

      However, we have some thoughts about the alternatives suggested by the Reviewer. This is discussed in detail below, but briefly: experimental data, rather than the model itself, suggests that the alternative mechanisms mentioned by the Reviewer do not explain the effects of transcription. After treatment with alpha-amanitin, we do not observe changes to actin-based confinement or contraction (Figure 3E, Supplemental Figure 2), and there are no changes to chromatin histone modifications or nuclear rigidity (Figure 3). We also are skeptical of osmotic pressure arguments since 1) fluid, ions, and small biomolecules should freely flow through nuclear pores to maintain osmotic pressure balance between the nucleus and the cytoplasm, especially on hours-long time scales, and 2) increasing the osmotic pressure by fragmenting chromatin has previously been observed to have either no effect or a suppressive effect on nuclear stiffness (Stephens et al. MBoC 2017, Belaghzal et al. Nat Genet 2021), which would potentially increase blebbing (the opposite of the effect suggested by the Reviewer). We have addressed this further in the revised Results section (page 10) and below.__ __

      __ 4. The data in the paper is not strong enough to rule out the more conventional mechanism of nuclear pressurization, which could be caused by F-actin based confinement or chromatin decompaction, or changes to osmotic pressure. Immunostaining of myosin is not a reliable way to compare myosin activity across conditions. It is possible that the long treatment with alpha-amanitin (unto 24 h, Fig. 2) relieves the pressure in the nucleus without measurable changes in the already established cell shape and hence the nuclear shape (height changes in spread cells are small at best -- valproic acid appears to reduce height by ~0.5 microns in Figure 3E which is smaller than the optical resolution along the z-axis of a typical confocal microscope).__

      The Reviewer has proposed several alternative mechanisms and questioned the use of immunostaining and nuclear height measurements in the manuscript. We address each of these below.

      Specifically, the Reviewer is concerned that we cannot rule out the more conventionally believed mechanisms of 1) actin confinement, 2) actin contraction 3) chromatin decompaction and/or 4) osmotic pressure. We have revised the text to clarify that our data and data from others strongly supports that these four “conventional” mechanisms are not responsible for transcription inhibition-based nuclear blebbing suppression (revisions on pages 7, 10, 14).

      1) Actin confinement, as measured by nuclear height does not change upon transcription inhibition (Figure 3, C-E). Thus, our data supports the idea that transcription inhibition suppresses nuclear blebbing through a different mechanism. The Reviewer objects to this measurement on the basis that even the 0.5 µm change observed for VPA-treated cells is below optical resolution. However, optical resolution is not relevant to this measurement because we are not resolving two objects; rather, we are measuring the size of one object, the nucleus.

      When two dots/objects are separated in the same frame or in different z slices, one needs to clearly distinguish two gaussians point spreads from the two objects a distance X apart. That is resolution and that is not the relevant limitation here. We measure the size of one object (the nucleus) using full-width half-maximum, which can quantify changes in nuclear height at scales finer than the optical resolution. For example, the FWHM of a fluorescence bead can be observed to change by just 10’s of nm depending on the light emitted; with small wavelengths, one has smaller FWHM (from the Rayleigh criterion, θ = 1.22λ/D, where λ is the wavelength of the light). Our measurements are through a z-stack at 200 nm steps, thus the change in distance from wild type to VPA-treated of 0.5 µm is 2.5 z steps (not smaller than one z step). Finally, we have additional data showing our ability to measure these differences many times over (Pho et al. 2022 biorxiv).

      Image left is from: https://en.wikipedia.org/wiki/Full_width_at_half_maximum

      Image right is a crop of Figure 3D from the manuscript.

      2) Actin contraction, as measured by γMLC2, does not change either (Supplemental Figure 2). However, we know that actin contraction is a major determinant of nuclear blebbing (Mistriotis et al., 2019 JCB and Pho et al., 2022 biorxiv). Therefore, our data support that transcription affects blebbing in some other way than actin contraction.

      The Reviewer disputes this finding by stating that “immunostaining of myosin is not a reliable way to compare myosin activity across conditions.” Published reports show that γMLC2 immunostaining is a reliable way to measure actin contractility changes (Wan et al. MBoC 2012; Ramachandran et al. Mol Vision 2011; Duan et al. Cell Cycle 2016; Nishimura et al. PLOS One 2020). We have another preprint showing that alterations to actin contraction as measured by immunostaining of phosphorylated myosin light chain 2 (γMLC2) determine nuclear blebbing, independent of changes in actin confinement (Pho et al., 2022 biorxiv). There, we clearly show that changes in γMLC2 immunostaining can measure changes in actin contraction due to well-established modulators. Similarly, the ROCK inhibitor Y27632 in Supplemental Figure 2 can be viewed as a positive control in that γMLC2 immunostaining is clearly decreased after treatment with the inhibitor.

      3) Chromatin decompaction via H3K9ac and chromatin-based nuclear rigidity are not rescued by transcription inhibition. New data also shows that levels of heterochromatin H3K9me2,3 does not change upon transcription inhibition (Figure 3B). The new data presented in this manuscript shows that transcription inhibition also suppresses blebbing in DZNep-treated cells (Figure 2D), where chromatin compaction by heterochromatin formation is inhibited (Stephens et al. MBoC 2019). Together, these experiments suggest that transcription inhibition is not suppressing nuclear blebs through increases in heterochromatin-based chromatin compaction.

      Furthermore, the lack of change in the measurement of nuclear stiffness via micromanipulation (Figure 3G) provides a complementary metric suggesting that chromatin compaction is unchanged, at least in the case of VPA + alpha-amanitin.

      Altogether, these results are inconsistent with transcription inhibition suppressing blebs through alterations to chromatin compaction.

      4) Osmotic pressure is the least or not at all established of the four “traditional” mechanisms. The Reviewer proposes that transcription inhibitors, such as alpha-amanitin, could relieve osmotic pressure within the nucleus. We disagree with this explanation in that it is implausible for the nucleus to maintain an osmotic pressure imbalance in VPA-treated cells over long periods of time. Fluid, ions, and small biomolecules likely can flow through nuclear pores to maintain osmotic balance between the nucleoplasm and cytoplasm, especially over the hours long duration of VPA treatment. Furthermore, we are skeptical that VPA treatment, even with its chromatin-decompacting effects, significantly increases osmotic pressure because nuclear stiffness actually decreases after VPA treatment (Stephens et al. MBoC 2017, 2018, 2019; Krause et al. Phys Bio 2013; Shimamoto et al. MBoC 2017; Hobson et al. MBoC 2020) . Increased osmotic pressure should cause the nucleus to be stiffer. Moreover, nuclei in VPA-treated cells consistently undergo blebbing and rupture, which would naturally relieve any pressure imbalance. Thus, the notion that the measurements after hours VPA or VPA+aam treatment (Figures 2-5) are the result of a steady-state change in osmotic pressure is simply inconsistent with the experimental data.

      We note that in cases of acute osmotic shock, where the osmotic pressure balance of the nucleus may be altered, the nucleus changes in size (e.g., see Finan et al., 2009 Ann Biomed Eng), which we do not observe in our experiments. Our measurements of nuclear area (Figure 6C) and height (Figure 3E) show no change nuclear size upon transcription inhibition (for more on the issue of height measurement, see the previous point).

      To further address concerns about overnight treatment causing off-target effects, we have provided new data from a shorter treatment duration in the manuscript. The new data shows that within 8 hours, blebs exhibit more reabsorption after alpha-amanitin, triptolide, and flavopiridol treatment in both VPA-treated and LMNB1-/- cells (Supplemental Figure 1B). Additionally, we note that actinomycin D decreased nuclear blebbing in 1.5 hours, and thus did not require overnight treatment.

      In summary, our original and new data clearly show that transcription contributes to nuclear blebbing. Transcription inhibition does not change other factors (such as actin-based confinement or contraction, changes in chromatin compaction, or osmotic pressure), that have been shown or may be thought to contribute to nuclear blebbing. The revised manuscript addresses this issue through the inclusion of new data, as discussed above.

      __

      Further to point 4, the data in Figure 4B and 4D both show a decrease in the mean of the % of ruptured nuclei and rupture frequency (please provide units for this frequency on the Y-axis). With more experiments, perhaps the data would have reached statistical significance?__

      The Reviewer is asking for clarification on the data included in Figure 4 B and D reporting the percentage of cells that display a nuclear rupture.

      We have revised the manuscript to clarify that Figure 4B is the percentage of all nuclei that show at least one nuclear rupture. The measurement unit, percent (listed as “[%]”), is shown on the y-axis. The revised manuscript also clarifies that Figure 4D reports, for the nuclei that rupture, the average number of times a nucleus ruptures during the 3-hour time-lapse.

      The Reviewer stats that “with more experiments, perhaps the data would reach statistical significance?” To address this comment, we have altered the text to explain that % of all nuclei that rupture at least once does not significantly decrease by t-test but does show a non-statistically significant decrease. The data in Figure 4B shows that VPA causes 18.5 +/- 2.7 % rupture and VPA+alpha-amanitin causes 12.4 +/- 1.5 % rupture. Student’s t-test is p = 0.08 which is not statistically significant (p > 0.05) for six biological replicates each consists of n = 100-300 cells. We feel the data speaks for itself without us doing more experiments with the sole purpose of getting a lower p value. The stronger data is in Figure 4D, which clearly shows less nuclear ruptures per nucleus. We appreciate the Reviewer’s perspective and have modified the text in the Results and Discussion sections to reflect these important points (pages 8 and 14). __ __

      __ Minor comments.

      1. Confirmatory data, which has already been published in the same cell line in the past, could be moved if possible to supplemental information. Figure 1 seems to be a characterization of the efficacy of alpha-amanitin which is well-known, and therefore does not represent an original finding. It should perhaps be in supplemental information.__

      We understand the Reviewer’s point but would like to leave Figure 1 as a main text figure to provide a clearer story for all readers of our manuscript.__ __

      __ 2. Did the counting method used to collect data in Figure 4B exclude nuclei that rupture multiple times? This should be specified in the manuscript.__

      No, Figure 4B is the percentage of nuclei that rupture, which includes nuclei that rupture any number of times as a single nucleus that ruptures. We have revised the Figure 4 legend to clarify this point. __ __

      __ 3. This statement should be rephrased: "Since transcription is needed to form and stabilize nuclear blebs, at least some aspect of nuclear shape deformations appears to be non-mechanical" - deformation in the model in Figure 7 is clearly 'mechanical' - driven by motor force.__

      We appreciate the Reviewer’s feedback and have rewritten the text changes this to “independent of the bulk mechanical strength of the nucleus”. __ __

      __ 4. It is important to specify the times for which cells were treated with the various drugs in each figure (and not just in figure 2).__

      We appreciate the Reviewer’s feedback and have added this information to each figure legend.__

      __

      __

      Reviewer #3 (Significance (Required)):

      This paper reports new data that nuclear blebbing induced by treatment with valproic acid can be inhibited by co-treatment with alpha-amanitin. The data provided are reproducible across different cell lines. The data suggest that inhibition of transcription inhibits blebs which are induced by valproic acid treatment, but it does not inhibit blebs in cells untreated with valproic acid. Immunostaining reveals some enrichment of RNA pol II phosphorylated at Ser5 in valproic acid-induced blebs, suggesting an enhancement of transcription-initiation (but not transcriptional elongation) in the bleb. Alpha-amanitin treatment reduces bleb formation and bleb lifetime.

      While the data are clearly presented, and interesting in terms of relating transcription to blebbing, the proposed interpretation in terms of a new mechanism of blebbing is not strongly supported by the data or by the computational model. More definitive evidence is required to rule out that blebbing in valproic acid treated cells is not caused by a pressurization of the nucleus due to valproic acid treatment, which could be released by treatment with alpha-amanitin treatment for upto 24 h. The manuscript generalizes the findings to 'nuclear shape', and interprets them as suggestive of an alternative mechanism of establishment of nuclear shape; this generalization seems unsupported by the data.__

      Overall, the data provided is novel and interesting to cell biologists, provided more definitive evidence can be provided to rule out other models and to establish the new proposed model for nuclear blebbing. Else, the claims of an alternative mechanism for blebbing could be toned down, and the data on the relation between transcription and blebbing, which is the novel and interesting finding in this paper, could be presented in a more focused way.

      We appreciate that the Reviewer points out that “the data are clearly presented and interesting” and “reproducible across different cell lines.” The Reviewer’s main concerns appear to be with: 1) the effect of transcription inhibition on blebbing that is not induced by VPA, 2) alternatives or limitations to our proposed interpretation of the results, and 3) describing our results as applicable to “nuclear shape” in general.

      We have addressed each of these concerns in detail in the above response and the revised manuscript. To summarize:

      • We have included new data to show that four different transcription inhibitors combined with four different nuclear perturbations exhibit the same effects (Figure 2 and Supplemental Figure 1). Furthermore, we have clarified in the revised manuscript that even wild type (“untreated”) nuclei exhibit changes to blebbing dynamics (decreased stability, increased reabsorption) after transcription inhibition (Figure 5). Furthermore, concerns about time intervals was addressed by time lapse imaging showing that bleb reabsorption (return to normal shape) increases six-fold in the first 8 hours of transcription inhibitor treatment (Supplemental Figure 1B).
      • The original manuscript, new data, and previous data from the literature provides evidence that alternative mechanisms involving “pressurization” (discussed above), the actin cytoskeleton (Figure 3E and Supplemental Figure 2), and chromatin and nuclear rigidity (Figure 3) do not explain the observed effects of transcription inhibition. We discuss this in detail in the revised manuscript and the above response. Furthermore, we have revised our presentation and discussion of the simulation model to describe its relevance more clearly to the results, support its inclusion in the manuscript, and provide appropriate caveats on our computational findings.
      • We have revised the manuscript to clarify that our results primarily concern nuclear blebbing and rupture. The Reviewer is correct that the current investigation does not particularly focus on larger-scale shape such as circularity/ellipticity. In summary, our data clearly indicate that transcription contributes to nuclear blebbing and rupture. Previously suggested mechanisms of blebbing are generally inconsistent with the observed effect in combination with our other measurements. The model investigates a plausible new, complementary mechanism, which in itself represents an advance in biophysical modeling and ties the manuscript together.

      We thank the Reviewer for their thorough critique, which we have now addressed. We believe that the new experimental data and analysis and computational modeling in our manuscript significantly advances our overall understanding of nuclear blebbing, even as it raises new questions to be addressed by future work.

    1. Author Response

      Reviewer #1 (Public Review):

      The Introduction starts by setting up a straw-man argument, claiming that the assumption is that gene expression is set up as stable expression domains that undergo little or no subsequent change. I don't think that any current developmental biologist thinks this is true. The references used to support this claim are from the 1990s up to the early 2000s. There are numerous examples since then that show that developmental gene expression is dynamic as a rule.

      Our argument might seem like a strawman for certain sector of developmental biologists who work in the field of pattern formation, or aware of the latest advances in the field. However, a look at current publications on developmental enhancers reveals that the dominant model with which enhancer biologists interpret their data is still the French Flag model (specifically, the eve-stripe-2 model of enhancer function). We meant to address this audience, and attempted to clarify this from the very beginning by stating that “Much of our models of how enhancers work during development relies on the assumption that …”. Please, note here that we are talking about “models of how enhancers work”, not models of pattern formation in general.

      The Introduction then continues as a rather detailed review of enhancers, Tribolium methodology, tools for identifying enhancers, and more. The Introduction cites 99 references, which seems excessive for what is essentially an experimental paper. Significant parts of the Introduction can be trimmed or removed. There is no need to mention all the tools available for Tribolium if they are not used in the described experiments. A thorough analysis of the advantages and disadvantages of different modes of ATAC-seq is also beyond the scope of the Introduction. The authors should explain why they chose the tools they chose without excessive background.

      In the revised manuscript, we shortened the discussion of Tribolium methodologies and imaging techniques. However, we think that the paragraph discussing ATAC-seq strategies are important to justify our choices as why we took the effort to cut the embryos to perform tissue-specific ATAC-seq analysis, instead of performing whole-embryo ATAC-seq.

      Having said that, the Introduction actually overlooks a lot of significant work that is relevant to the subject of the paper. Specifically, the authors completely ignore all of the work on development in hemimetabolous insects such as Oncopeltus and Gryllus - the omission is glaring. There has been a lot of relevant work on dynamic gene expression patterns coming out of these species.

      You are right indeed. We apologize for that. We added now citations to relevant works from those to insect to the manuscript.

      The experimental setup involves cutting embryos into three sections at two time points. The results then discuss differences in "space" and "time" but there is no discussion of the embryological meaning of these terms. What is happening at the two time points from a developmental perspective? What is the difference between the three sections? There is a lot of relevant development going on at these stages and important regional differences, which have been well-studied in Tribolium and in other insects but are not even mentioned.

      A good point. Correlating chromatin landscape changes with embryological events is an interesting point that needs further analysis and the application of ATAC-seq to further timepoints. We chose leaving this to future work (possibly using single cell ATAC-seq). In this work, we restricted our analysis to the benefits of applying time- and tissue-specific ATAC-seq in predicting active enhancers. We added a note on this point in the discussion.

      In the preliminary results of the ATAC-seq analysis, it is clear that there are significant differences between the sections, which should come as no surprise, but fairly minor differences between the same section at the two time points. This could be because the two time points are pretty close together at a stage when there is a lot of repetitive patterning going on. A possible interpretation, which the authors don't mention because it goes against their main thesis, is that maybe most of the processes that are taking place at this stage are not dynamic enough to show up at the temporal resolution they have applied. This is worth at least a mention.

      We agree with this observation. We would like to draw the reviewer’s attention to our statement “Together, our findings indicate that changes in chromatin accessibility in Tribolium at this developmental stage are primarily associated with space rather than time…””. Detailed analysis of the chromatin dynamics across time would need taking more datapoints, which is something we plan to do in future work.

      The authors link each accessible site to the nearest gene when looking at putative enhancer function. This is a risky assumption since there are many examples of enhancer sites that are far upstream or downstream of the target gene and often closer to an unrelated gene than to the target gene. The authors should at least acknowledge this problem with their functional annotation.

      The reviewer is correct in that, in particular for large eukaryotic genomes, enhancers are often located far away from their target genes. We have no comprehensive enhancer-target data that would enable us to perform a more accurate analysis. Furthermore, the assumption that at least for some of the enhancers the nearest genes will also be their targets, and hence, provide insight into the function of the enhancers themselves seems reasonable given the relatively compact organization of the Tribolium genome. In any case, the analysis was just presented as one of several sanity checks for our ATAC-seq data; for the sake of streamlining the manuscript we no longer include this analysis in the current version of the manuscript.

      In the Discussion, the authors claim that contrary to how it may seem, the question they are addressing is not a "fringe problem". Once again, I think this is a straw man. No active researcher thinks that the question of dynamic regulation of gene expression during development is a fringe problem. On the contrary, most researchers will accept that this is one of the most interesting and important questions in current developmental biology.

      This whole argument was removed from the Discussion in the revised manuscript.

      Perhaps the most significant problem with the manuscript is that it is all built around the premise of enhancer switching between dynamic enhancers and static enhancers. The authors find one site that is consistent with their prediction for a dynamic enhancer and one site - regulating a different gene - that is consistent with their prediction for a static enhancer and claim that they have provided support for their model. I think this claim is grossly exaggerated. They present data that can be seen as consistent with their model but are a long way from providing evidence for it.

      We actually thought we were cautious enough about this. Nowhere in our text did we mention that our data “support” the enhancer switching model. We stated quite early (in the abstract, actually) that:

      “We found our data consistent with a model in which the timing of gene expression during embryonic pattern formation is mediated by a balancing act between enhancers that induce rapid changes in gene expressions (that we call ‘dynamic enhancers’) and enhancers that stabilizes gene expressions (that we call ‘static enhancers’).”

      To make this message clearer, we added the following sentence to the abstract of the revised manuscript: “However, more data is needed for a strong support for this or any other alternative models.” And again at the end of the Introductions: “While these data are in line with our Enhancer Switching model, more data is needed as a strong support for the model.” Also, at the end of the Results section examining runB enhancer dynamics, we stated: “However, this merely shows that runB activity dynamics are consistent with our model, but is still far from strongly supporting the model (more on that in the Discussion).” Also for the Results section on enhancer hbA dynamics: “Again, this merely shows that hbA activity dynamics are consistent with our model, but is still far from strongly supporting it.”.

      Moreover, in the opening paragraph of the Discussion, we explicitly and quite openly addressed this point, and suggested what kind of observations and experiments needed in the future to qualify as a “strong support” for the model. We even ran simulations for what kind of observation should one expect in enhancer deletion experiments if the model is correct (Figure 7).

      But it seems like discussing the enhancer switching model in detail gives the impression of its central importance to the paper. In our view, our experimental system is quite general and does not depend on that model, but the point of mentioning it is that it is an example of how could an alternative model of enhancer regulation be of relevance to the problem of dynamic gene expression. This wouldn’t be obvious without this or a similar model that is showing this, even if it is hypothetical. But since our presentation is obviously giving the impression that our claims are stronger that they really are, we altered our phrasing in the introduction of the revised manuscript to make our point clearer:

      “Despite its potential inaccuracies, the Enhancer Switching model exemplifies the type of alternative frameworks we need to explore in order to elucidate the mechanisms driving the generation of gene expression waves during development. Consequently, an appropriate model system is required, allowing us to test not only the Enhancer Switching model but also any other prospective model that provides a satisfactory explanation for the initiation of gene expression waves at the enhancer level.”

      We hope that this addresses the reviewer’s quite legitimate concerns.

      Like the Introduction, the Discussion includes long paragraphs (lines 450-480) that are more suitable for a review/hypothesis paper. The data presented in this manuscript has little relevance to the question of kinematic vs. trigger waves, and therefore there is no real reason for the question to be discussed here.

      We have now significantly shortened the discussion.

      Reviewer #2 (Public Review):

      Open questions:

      What happens with the runB enhancer at later stages of embryogenesis? With what kind of dynamics do the anterior-most stripes fade and does that agree with the model? Do they show the same dynamics throughout segmentation? I think later stages need to be shown because the prediction from the model would be that the dynamics are repeated with each wave. I am not so sure about the prediction for ageing stripes – yet it would have been interesting to see the model prediction and the activity of the static enhancer.

      Yes, the dynamics repeats in the germband. This is shown in Supplementary Figure 8. The dynamics in germband were shown by visualizing yellow mRNA and intronic probes. MS2 imaging was not possible to be used because the embryo dive into the yolk for a while, and then it becomes difficult to capture the germband in the right orientation for imaging. We are currently working to use light sheet microscopy for imaging germband stages.

      I understand that the mRNA of the reporter gene yellow is more stable than the runt mRNA. This might interfere with the possibility to test your prediction for static enhancers: The criterion is that the stripes should increase in strength as the wave migrates towards the anterior. You show this for runB – but given that yellow has a more stable transcript – could this lead to a “false positive” increase in intensity with the slower migration and accumulation of transcripts? I would feel more comfortable with the statement that this is a static enhancer if you could exclude that the signal is blurred by an artifact based on different mRNA stability. What about re-running the simulation (with the p–rameters that have shown to well reflect endogenous –unt mRNA levels) but i“creasing the parameter for the stability of the mRNA? Are static and dynamic enhancers still distinguishable? The claim of having found a static enhancer rests on this increase in signal, hence, other explanations need to be excluded carefully.

      Good questions. Note that runB reporter dynamics were examined not only by visualizing yellow mRNAs (which indeed seem to be more stable than endogenous run mRNA; see Supplementary Figure 10), but also using MS2 (with virtually zero mRNA stability; although stability was simulated in the shown movies to show virtual mRNA dynamics), and intronic yellow mRNA (showing de novo transcription; Supplementary Figure 10; you will need to zoom in to see intronic de novo transcripts). The expected dynamics of a static enhancer reporter is quite unique: it progressively increases initially as it propagates from posterior to anterior, then it progressively decreases as it slows down and stabilizes at the anterior. Then they eventually fade. These full range of dynamics is obvious in germband embryos stained for intronic yellow to show de novo transcription of runB enhancer reporter (Supplementary Figure 10; you will need to zoom in to see intronic de novo transcripts).

      Running the simulation for the model using different degradation rates for the enhancer reporter made the static enhancer’s expression either less or more persistent, but gave the same overall result: the static enhancer expression has diminished expression at the very posterior, but high expression as its expression wave exiting the growth-zone/SAZ. This is consistent with not only yellow mRNA expressions of runB, but with its intronic expression as well (Supplementary Figure 10; you will need to zoom in to see intronic de novo transcripts).

      What about the head domain of the runB enhancer (e.g. Fig. 6A lowest row): This seems to be different from endogenous expression in your work and in Choe et al. Is that aspect different from endogenous expression and can this be reconciled with your model?

      Yes, indeed this aspect cannot be explained by our model. We believe that head patterning in insects is regulated by a different regulatory network. This network might be (de)-activated by missing repressors in the selected DNA segment for runB enhancer. We mentioned this issue in the revised manuscript.

      The claim of similar dynamics of expression visualized by in situ and MS2 in vivo relies on comparing Fig. 6C with 6A. To compare these two panels, I would need to know to what stage in A the embryo in C should be compared. Actually, the stripe in 6C appears more crisp than the stripes in 6A.

      Were the enhancer dynamics tested in vivo at later stages as well? I would appreciate a clear statement on what stages can be visualized and where the technical boundaries are because this will influence any considerations by others using this system.

      One really cannot be that super-precise about the timing of a very dynamic process in space and time like this one we are studying. We believe that Figure 6D shows clearly that runB activity dynamics are similar to endogenous run expression.

      How do the reported accessibility dynamics of runA enhancer correlate with the activity of the reporter: E.g. is the enhancer open in the middle body region but closed at the posterior part of the embryo? Or is it closed at the anterior – and if so: why is there a signal of the reporter in the head?

      You show that chromatin accessibility dynamics help in identifying active enhancers. Is this idea new or is it based on previous experience with Drosophila (e.g. PMID: 29539636 or works cited in https://doi.org/10.1002/bies.201900188)? Or in what respect is this novel?

      Our manuscript contributes to the growing body of evidence confirming that accessibility per se does not imply activity. Of course, this is not a new idea, but given the widely use of accessibility as a proxy for enhancer activity in the genomics community, we do feel it is important to reiterate the message. As the reviewer correctly indicates, several published findings point to a correlation between accessibility dynamics and enhancer activity. However, to our knowledge, this is the first example in Tribolium. It is important to point out that what “dynamic” means strongly depends on the experimental design. Even in Drosophila, not enough studies have been conducted to fully understand the relationship (e.g., ideally, this should be done on a continuous time scale and at single cell level). We acknowledge in the manuscript that this relationship has been observed before in other species (and have added the references suggested by the reviewer, for which we are very grateful), but still believe that our observations are highly significant to the Tribolium community.

      Reviewer #3 (Public Review):

      I have two major concerns: First, the claim about differential accessibility being related to enhancer activity is not really established from the presented data, in my view. This needs to be clarified. (I do believe in the claim to some extent, but not based on presented evidence.)

      We agree with the reviewer that more data – and, more importantly, independent replication – are necessary to confirm this finding. Please, refer to our response to your comment regarding the statistical significance of the findings.

      Second, the evidence in support of the Enhancer Switching model for runt should be accompanied by identification of and spatiotemporal profiling of the “speed regulator”, if this is not established yet.

      Experiments supporting the role of Cad as a speed regulator for both pair-rule and gap genes have been published in El-Sherif et al PLOS Genetics 2014 and Zhu et al PNAS 2017. We added a comment stressing this fact.

      In addition to these two concerns, the simulations of the Enhancer Switching model need to be described, at least in the outline, in the Methods section.

      Done

    1. Author Response

      Reviewer #1 (Public Review):

      Specifically, the authors define "efficacy" (eta) of a ligand as the fractional change in binding free energy between the open and the closed states of the channel.

      We assume that the word in quotes is a typo; ղ is efficiency, not efficacy (now given the symbol λ). We now emphasize the distinction immediately after Eq. 2.

      1) One concern regards the clustering of the data sets in Fig. 5 into exactly 5 eta-classes. First, two clusters contain only two data points each. Second, the proposed "catch&hold LFER model" (Fig. 2) does not predict the existence of a discrete number of such eta-classes. How strong is the evidence that there are exactly 5 classes as opposed to a continuum of possible eta values.

      Statistical (x means cluster) analysis indicates that the 23 agonists segregate into 5 ղ classes. Groups with only 2 members (plus the intercept) are less well defined (Fig 4) but are supported by the 5 mutational ղ classes (Fig. 7). (see above)

      2) The authors do not discuss the uniqueness of the proposed model.

      see above. Ln 405 Induced fits are common.

      In fact, it seems to me that the existence of eta-classes might be explained just as well by an alternative model which assumes a single gating mechanism for the receptor,

      We are not sure what a “single gating mechanism” means. Does non-single refer to i) the2 stage induced fits (catch-hold LFER)? … ղ classes makes this conclusion unavoidable. ii) our conjecture that are there are 5 different C versus O binding site structural pairs…? Energy derives from structure, so we the 5 energy ratios indicate 5 structural pairs. iii) multiple steps inside gating (ϕ)? …So far there have not been any alternative explanations for the organized map of ϕ. iv) catch itself?... Evidence for this induced fit is given in Fig 2 and 7 SI, and on Ln 528-547 we discuss the implications of kon to C versus O. Ln 405 Local ‘Induced fit’ rearrangements in enzymes are common. We think the evidence is strong for the bottom scheme in Fig 2A.

      but distinct patterns of ligand-protein interactions for the different agonists.

      ղ classes derive from distinct interactions for different agonists, but what these are and whether the ‘contact number’ idea is useful are uncertain (see above).

      The pore opening-associated increase in agonist affinity is typically caused by a tightening of the substrate binding site (often called clamshell closure) …

      Ln 379-386 In the Discussion we now relate catch-hold to induced fit

      Ln 455, 461-463, 471-474 Fig 2SI and the induced fit to clamshell closure

      Reviewer #2 (Public Review):

      This is an interesting manuscript with a worthwhile approach to receptor mechanisms. The paper contains an impressive amount of new data. These single molecule concentration response curves have been compiled with care and the authors deserve great credit for obtaining these data.

      Ln 233 ղ can be estimated from a CRC built from whole-cell currents…

      Ln 150 …or indeed any method that estimates KdC and KdO (for example binding assays, or perhaps in silico simulations of AC and AO structures)

      I judge the main result to be that there are different values of the recently-proposed agonist-related quantity "efficiency".

      Ln 21, 26-27, 535-547 OK, but to us the most interesting insight is that in AChRs binding IS gating.

      These values are clustered into 5 quite closely spaced groups. The authors propose that these groups are the same whether considering mutations in the binding site or different agonists.

      see above

      It was unclear to me in several places, what new data and what old data are included in each figure. Therefore readers may have difficulty judging the claimed advance. This difficulty is not helped by the discussion, which includes some previous findings as "results".

      see above.

      A further weakness is that it is unclear how general or how specific these concepts are. The authors assert that they are, by definition, completely universal. However, we do not have reference to previous work or current data on any other receptor than the muscle nicotinic. I could not square the concept that "every receptor works like this" with the evident lack of desire to demonstrate this for any other receptor.

      Ln 132-136 There are reasons to think that receptors in general work according to Figure 1A. A thermalized ligand (for instance TriMA, MW 60) has the momentum of only ~3 water molecules. A momentum sensor would have terrible signal/noise.

      Reviewer #3 (Public Review):

      This work attempts to introduce a new attribute of the receptor- efficiency, a fraction of an agonist binding energy consumed by conformational transition of the receptor from resting to active (open) states. Furthermore, the authors use an impressive set of experimental data (single channel recordings with 23 agonists and 53 mutations) to measure the efficiency for each agonist and mutant receptor. All the estimated efficiencies fall into a few groups and inside each of the efficiency groups there is a strong correlation between agonist affinity and receptor opening efficacy.

      The main finding in this study is that estimated efficiencies fall into 5 groups.

      see above.

      There is no clear description of the method how the efficiencies were allocated into different groups. Most importantly, it is not clear if the method used takes into account the uncertainty of the efficiency estimate. The study does not show any statistical metrics of the efficiency estimates as well as any other calculated variable such as dissociation equilibrium constants to resting or open states. Surely, the uncertainty of the efficiency should matter especially considering how near the efficiency group values are (eg. difference about 10% between 0.51 and 0.56 or 0.41 and 0.45).

      see above

      All the tested agonists fell into groups according to the efficiency value attributed to them. It is difficult to see why some of the agonists belong to the same group. For example, it is not obvious at all why such agonists as epibatidine, decamethonium and TMP are in the same group. The question, I guess, arises if this grouping based on efficiency has any predictability value. Furthermore, if a series of mutations with the same agonist fall into different groups, the prediction power of this approach is very limited if one attempts to design a new agonist or look for a new mutation.

      see above and Ln 548-561 (last para of text). Efficiency is a relatively new idea. This report is one of only a few on the subject. More experiments with different receptors by more labs using other approaches are needed to ascertain whether ղ is general.

    1. Author Response:

      The following is the authors' response to the current reviews.

      We appreciate the thoughtful critiques of the reviewers. While we agree that performing additional experiments and analyses probing the sensitivity of the technique would be useful for future studies, we are unable to perform additional experiments as our lab has closed. We share this technique as a starting point for further investigation, but it may need to be modified for success in other contexts. We have provided details of the scenarios (life stage, feeding, day, number of ticks) where we successfully sequenced B. burgdorferi from ticks, as well as one where we did not (unfed nymphs) as a starting point. We will clarify in proofing that our qPCR experiments show that we capture the vast majority of B. burgdorferi flaB mRNA from our input samples, suggesting that we are likely capturing the majority of the B. burgdorferi.

      In this work, we were most interested in using RNA-seq to perform differential expression analysis between annotated mRNAs across our timepoints. We have provided the number of genes detected in each sample (92% of annotated transcripts on average) as well as the median number of reads covering each gene (604 on average) in the supplemental file containing sequencing statistics. This coverage is highly reproducible across replicates, with an average Pearson correlation of 0.99 between gene expression levels (as Transcripts Per Million) between any two replicates. These data and the fact that many of the gene expression changes we observed align with previous observations of others give us confidence in our differential expression analysis. For those interested in tRNAs or sRNAs, we think that it would be best to modify the protocol to focus specifically on capturing those sequences in the library preparation. We encourage others interested in other aspects of our data to download it and explore it.

      We will correct remaining wording issues in proofing.

      —————

      The following is the authors' response to the original reviews.

      Dear Reviewing Editor,

      We thank you and the reviewers for the thoughtful comments on our manuscript, and we are excited to submit a revised version of our manuscript “Longitudinal map of transcriptome changes in the Lyme pathogen Borrelia burgdorferi during tick-borne transmission.” In response to the reviews, we have made the following changes to our manuscript:

      1. We updated the text for increased clarity around experimental details, including statistical analyses.

      2. We added additional details about the mapping of non-Bb reads as well as more information about Bb read coverage.

      3. We compared our differentially expressed genes to 4 previous studies of global transcriptional changes in different tick feeding contexts.

      4. We updated the discussion to address these comparisons as well as caveats of our study more directly.

      Please see our responses to individual comments below.

      Reviewer #1 (Public Review):

      In this study, Sapiro et al sought to develop technology for a transcriptomic analysis of B. burgdorferi directly from infected ticks. The methodology has exciting implications to better understand pathogen RNA profiles during specific infection timepoints, even beyond the Lyme spirochete. The authors demonstrate successful sequencing of the B. burgdorferi transcriptome from ticks and perform mass spectrometry to identify possible tick proteins that interact with B. burgdorferi. This technology and first dataset will be useful for the field. The study is limited in that no transcripts/proteins are followed-up by additional experiments and no biological interactions/infectious-processes are investigated.

      Critiques and Questions:

      We thank the reviewer for these thoughtful critiques and helping us improve our manuscript.

      This study largely develops a method and is a resource article. This should be more directly stated in the abstract/introduction.

      We edited the abstract and introduction to more directly state that we are sharing a new method and a resource for future investigations. (Lines 29-32; 101-103)

      Details of the infection experiment are currently unclear and more information in the results section is warranted. State the species of tick and life-stage (larval vs nymphal ticks) used for experiments. For RNA-seq, are mice are infected and ticks are naïve or are ticks infected and transmitting Borrelia to uninfected mice?

      We updated the results section to more clearly state the tick species and life stage and to make it more clear that infected ticks are transmitting Bb to naïve mice. (Lines 113-115)

      What is the limit of detection for this protocol? Experimental data should be provided about the number of B. burgdorferi required to perform this approach.

      We performed this protocol on pools of 6 (for later feeding stages) to 14 (for early stages) infected nymphs. Published studies (PMID: 7485694, PMID: 11682544) suggest that one day after attachment, there may be a few thousand Bb per tick, suggesting what we’ve measured here may come from on the order of 104 Bb. We were not able to capture consistent data from Bb from unfed ticks, which may be due to lower numbers or to an altered transcriptional state caused by lack of nutrients in the unfed tick. We updated the discussion to reflect some of these limitations and uncertainties. (Lines 461-465)

      More information regarding RNA-seq coverage is required. Line 147-148 "read coverage was sufficient"; what defines sufficient? Browser images of RNA-seq data across different genes would be useful to visualize the read coverage per gene. What is the distribution of reads among tRNAs, mRNAs, UTRs, and sRNAs?

      As we were interested in differential expression analysis, we defined sufficient as the number of reads needed per gene to determine statistically significant expression changes across days, which with DESeq2 is typically 10 reads. We reworded this section for clarity and added additional information about the median number of reads per gene which is also useful in thinking about differential expression analysis. (Lines 163-170) As we chose to focus on differential expression analysis here, we believe these are most relevant metrics to cover.

      My lab group was excited about the data generated from this paper. Therefore, we downloaded the raw RNA-seq data from GEO and ran it through our RNA-seq computational pipeline. Our QC analysis revealed that day 4 samples have a different GC% pattern and that a high percentage of E. coli sequences were detected. This should be further investigated and addressed in the paper: Are other bacteria being enriched by this method? Why would this be unique to day 4 samples? Does this affect data interpretation?

      We appreciate the interest in our data and pointing out this anomaly. We found that the day 4 samples do have a high percentage of reads that mapped to a bacterial species, Pseudomonas fulva, rather than ticks as we expected. (The reads that map to E. coli also map to P. fulva.) We have updated the results to include this information (Lines 156-165). We believe this is likely due to contamination from collecting ticks after they have fallen off mice in cages on day 4, rather than pulling ticks off the mice as in days 1-3. Unfortunately, as our lab has shut down, we cannot investigate the source further. We do think the high percentage of P. fulva reads suggests that other bacteria can be enriched with the anti-Bb antibody we used. We’ve updated the discussion to highlight this caveat. (Lines 459-460)

      While the presence of these bacterial reads did lower our overall Bb mapping rate and necessitate deeper sequencing for the day 4 samples, the Bb sequencing coverage of these samples is on par with samples from the other days in terms of percentage of genes with at least 10 reads and median number of reads per gene. Fewer than 0.0002% of the reads that map to Bb genes in any day 4 sample also map to P. fulva. We found that this small fraction of reads is dispersed across 334 genes in which an average of 0.05% (maximally 2.3%) of day 4 reads also map to P. fulva. Therefore, these bacterial reads do not change our interpretation of the results comparing gene expression across days, including day 4.

      Comprehensive data comparisons of this study and others are warranted. While the authors note examples of known differentially expressed genes (like lines 235-241), how does this global study compare to other global approaches? Are new expression patterns emerging with this RNA-seq approach compared to other methods? What differences emerged from day 1 to day 4 ticks compared to differences observed in unfed to fed ticks or fed ticks to DMC experiments? Directly compare to the following studies (PMID: 11830671; PMID: 25425211; PMID: 36649080.

      We added comparisons of our list of DE genes to those noted to change between “unfed tick” and “fed tick” culture conditions (PMID: 11830671 and 12654782), as well as fed nymph to DMC (PMID: 25425211 and 36649080) (Lines 231-252, Figure S4). These comparisons pointed us to two main findings: that global changes to Bb in different culture conditions generally agreed with the most dramatic changes we saw in our data, and that the timing of expression increases during feeding may relate to whether genes are more highly expressed in fed ticks or in mammalian conditions. Overall, the majority of our DE genes have been identified in at least one of these studies or in the other studies we compared to outlining RpoS, Rrp1, and RelBbu regulons. As many of these studies were asking slightly different questions and using different conditions and vastly different technology, we would expect some differences to arise from different contexts and some to be purely technical. The genes that were not seen in these previous studies tended to follow the same functional patterns we saw overall, heavily skewing towards genes of unknown function, outer surface proteins, and a handful of genes related to other functions. With the current state of the functional annotation of the genome, it is difficult to assess whether these amount to new expression patterns in and of themselves, so we focused on the overall trends in our data rather than those that were different from other studies.

      Details about the categorization of gene functions should be further described. The authors use functional analysis from Drechtrah et al., 2015, but that study also lacks details of how that annotation file was generated. Here, the authors have seemed to supplement the Drechtrah et al., 2015 list with bacteriophage and lipoprotein predictions - which are the same categories they focus their findings. Have they introduced a bias to these functional groups? While it can be noted that many lipoproteins are upregulated (or comment on specific genes classes), there are even more "unknown" proteins upregulated. I argue that not much can be inferred from functional analysis given the current annotation of the B. burgdorferi genome.

      We strongly agree that the current annotation of the Bb genome makes it difficult to perform meaningful global functional analysis, but we feel it is useful to get a general overview of gene functions. We described our methods for classifying genes into functional categories in the methods, in which we relied on previously published papers to make our best estimate of gene category (noted for each gene in the Table S4). Due to the lack of annotations for many genes, we focused on the relatively well-defined category of lipoproteins, as these are overrepresented as a group in our upregulated genes, as well as phage genes, which are not necessarily overrepresented, but are still interesting to us. We hope that others will look at the data (particular in Table S4, but also Table S3, or download the raw data and do their own analysis) with their own interests and biases and dig more into genes that we did not highlight specifically. We provide this data as a resource with the hope that some of the genes of unknown function that we see change here will be the subject of future functional studies so that this is less of problem in the future.

      Reviewer #1 (Recommendations For The Authors):

      In general, the paper is well written and digestible for a broad audience. However, some of the figure graphics are unnecessary and take away from the data. Please label tick species and tick life-stage in Figure 1 drawings. The legend of Figure 1 requires citations. The Figure 4B graphic is unnecessary and the colors are confusing as they are too similar to the color palette of Figure 4A, where the colors have meaning. The Figure 5A graphic is unnecessary and takes away from the data embedded within it.

      We more clearly labeled the species in Figure 1 and added citations to the legend. We have simplified Figures 4A and 5A for clarity.

      Clarify lines 220-259 and Figure 3. What days are being compared? Downregulated genes should also be commented on.

      We considered our set of differentially expressed genes as those that changed two-fold (multiple hypothesis adjusted p-value < 0.05) in any of the three comparisons shown in Figure 2 (day1 to day2, day1 to day3, day1 to day4). We clarified this at multiple points in the results (i.e Line 273). We commented on downregulated genes throughout, although as there were fewer genes and the magnitude of change was smaller, we focused more on upregulated genes.

      Line 327-329, state numbers not percentages. How many Bb proteins were actually detected?

      We updated this section to include numbers (Lines 371-374). In concordance with our sequencing data, we found (and were looking for) mainly tick proteins in this experiment.

      Data availability: B. burgdorferi and tick oligo sequences used for DASH should be provided in a supplemental table.

      We added a supplemental table of these sequences (Table S9). Please note they have been previously published in Dynerman et al. 2020 and Ring et al. 2022.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is overall well written and easy to follow. The data are compelling and support the conclusions. The discussion of this work is however highly insufficient and needs to be thoroughly edited:

      - Statistical analysis: The authors mention that DESeq2 was used. Please provide information on the type and the stringency of the tests used for differential gene expression analysis, including any additional potential correction for p-values (Bonferroni). The authors mention that genes with fold changes >2 were used for analysis, yet there is no information on the p-value cut off or if the genes with fold changes >2 were statistically significant. Please provide detail and rationale for the analysis.

      We clarified in the results and methods (Lines 200, 642-644) that we required a adjusted p-value < 0.05 from DESeq2’s Wald test with Benjamini-Hochberg correction along with a two-fold change when determining our genes of interest. As small fold changes showed statistically significant differences, we chose to set a fold change cutoff in most of our analysis to help us focus on the most highly expressed genes, like other studies we compared our data to. We included all of the DESeq2 results in Table S3 so that others may explore the data with different cutoffs if desired.

      - The field has been generating data on gene expression in ticks for decades. Yet, many of these studies are not referenced here. There is no discussion of how the data described here compares to what is known in the literature. For example, Venn diagrams or tables could be included for comparison with the data described lines 208-216. Extensive description and comparison of the data to the literature should be added in the discussion, and similarities/discrepancies should be discussed appropriately.

      We added additional comparisons to four different papers looking at global gene expression in Bb in the fed tick or tick-like culture conditions (Lines 231-252, Figure S4). This information as well as comparisons to transcriptional regulons (Figure S3) is available in Table S4. In addition to discussing some examples in the results, we added more information in the discussion regarding these comparisons (Lines 420-425). The majority of the genes that we see change over feeding have been previously noted to change expression during the enzootic cycle or be regulated by transcriptional programs active during this timeframe, and we have more clearly stated that. We focused on similarities here as these papers all ask slightly different questions in different contexts and use different technology which could all account for the many differences in individual genes between all of them and our work.

      - There is no discussion of the caveats of the study: for example, the authors are using an anti-OspA antibody, which could induce bias. The authors provide in-vitro pull down data supporting that this should not be an issue, but the pull down is performed from BSK-grown bacteria. This caveat should be discussed.

      We’ve added a paragraph to the discussion including this caveat and others (Lines 453-463).

      - Timing of RNA extraction: There is over 1h of delay between initial tick collection and RNA fixation. The effects of time on gene expression should be discussed.

      Although we were able to show that this timeframe did not affect cultured Bb gene expression, we added this to the discussion.

      - Gene expression is compared to Day 1. This introduces analyses bias as it does not allow identification of transcripts that first change upon initial feeding. This caveat should also be discussed

      We added this caveat – that we may miss gene expression changes in the first 24 hours of feeding – to the discussion.

      - This study is performed with 1 strain of B. burgdorferi on one tick species. Please provide perspective on the impact of these findings on Lyme disease causing spirochetes and their vectors broadly.

      We believe this method could be easily adaptable to study gene expression in other spirochete/vector pairs to determine similarities and differences and we added a comment to the discussion.

      - The discussion should also include insights on how to build on this work and include additional areas of method development to increase the recovery of B. burgforferi from ticks or other organisms and facilitate future transcriptomic studies.

      We added a few ideas to the discussion noting that this protocol could be modified for use in other timeframes, with other antibodies, or in other organisms. We also highlight the recent advent of TBDCapSeq by Grassmann et al. that may be used in conjunction with this type of protocol.

      Minor comments:

      - Consider re-wording the description of the methods and findings to the third person for coherence.

      The majority of the methods are now written in third person.

      - Over 90% of the reads did not map to B. burgdorferi: please provide additional information on what these reads mapped to (tick or mouse), and if the data reflects what is known in the literature

      We have updated the results and discussion with information about the reads that do not map to Bb (Lines 156-166). The majority of reads mapped the tick genome, which is what we expected. While a large number of reads in our day 4 samples unexpectedly mapped to Pseudomonas fulva, we do not believe this affects the interpretation of our data as we were still able to get broad genome coverage of Bb in these samples.

      - Please be more clear in the result section on the life stage of the ticks used for these studies.

      We have updated the results to clarify throughout.

      - Indicate how many total reads were generated for each sample

      This information is present in Table S1.

      - Provide statistical analyses for Figures 1C and D.

      We added t tests to determine statistical differences for these panels.

      Reviewing Editor (Recommendations for The Authors):

      1. It is important to mention in the abstract (line 27) that 'upregulated genes' is in comparison to day 1. This is also true in the introduction (lines 92-93).

      We updated in the results and introduction to more clearly include that day 1 is our baseline measurement.

      2. It is also important to discuss in the manuscript that because your 'controls' are day 1 samples, initial transcriptome changes in response to the tick environment might be missed.

      This has been added in the discussion as a caveat (Lines 460-463).

      3. As someone who does not work with Bb, I would like to have seen a clearer description of what the feeding event looks like. Although there is some text in the introduction that touches on that ('prolonged nature of I. scapularis feeding'), I would like to see something even clearer. Maybe stating that feeding may take from x-y days would clarify that for the non-specialist.

      We updated the results to more clearly state that the tick falls off of the mice by around 4 days after feeding, our last time point (Lines 113-115). Additional details of tick feeding are also in the Figure 1 legend.

      4. In Fig. 3 linear DNA molecules seem to be drawn to scale. Is that also the case for plasmids? This could be clarified in the legend.

      The genome is drawn approximately to scale. We noted this and updated the legend with more information about how linear and circular plasmid names denote their size.

      5. Figure 5C: Colors are a bit confusing here. The legend indicates that they refer to fold changes, but the scale in the panel shows expression levels, not fold changes. Please clarify. Also, is this really TPM or RPKM? If comparisons of relative levels between different genes are made, number of reads should be normalized by gene length.

      The heatmap in Figure 4C does show expression levels, and we updated the legend to more clearly state this. The highlighted gene names are meant to show which genes change two-fold during this time (those present in panel A). The data are presented as TPM (transcripts per million), which, like RPKM, is normalized by gene length (PMID: 20022975).

    1. Author Response:

      The following is the authors' response to the original reviews.

      We have now incorporated the changes recommended by the reviewers to improve the interpretations and clarity of the manuscript. We are grateful for their thoughtful comments and suggestions, which have significantly strengthened the manuscript.

      Reviewer #1 (Public Review):

      Park et al demonstrate that cells on either side of a BM-BM linkage strengthen their adhesion to that matrix using a positive feedback mechanism involving a discoidin domain receptor (DDR-2) and integrin (INA-1 + PAT-3). In response to its extracellular ligand (Collagen IV/EMB-9), DDR-2 is endocytosed and initiates signaling that in turn stabilizes integrin at the membrane. DDR-2 signaling operates via Ras/LET-60. This work's strength lies in its excellent in vivo imaging, especially of endogenously tagged proteins. For example, tagged DDR-2:mNG could be seen relocating from seam cell membranes to endosomes. I also think a second strength of this system is the ability to chart the development of BM-BM linkage over time based on the stages of worm larval development. This allows the authors to show DDR signaling is needed to establish linkage, rather than maintain it. It likely is relevant to many types of cells that use integrin to adhere to BM and left me pondering a number of interesting questions.

      We thank the reviewer for highlighting the strengths and impact of our work in expanding our understanding of tissue linkages and how DDR and integrins might work in other contexts.

      For example: (1) Does DDR-2 activation require integrin? Perhaps integrin gets the process started and DDR-2 positively reinforces that (conversely is DDR-2 at the top of a linear pathway)?

      DDR activation by receptor clustering upon exposure to its ligand collagen is well documented (Juskaite et al., 2017 eLife PMID: 285ti0245). Clustered DDR is rapidly internalized into endocytic vesicles, where full activation of tyrosine kinase activity is thought to occur (Fu et al., 2013 J Biol Chem PMID: 23335507). Supporting this model, we found that concentrated type IV collagen is required for vesicular DDR-2 localization in the utse and seam cells at the utse-seam connection. Whether DDR-2 activation requires integrin has not been fully established. However, one study using mouse and human cell lines showed that DDR1 activation occurs independent of integrin (Vogel et al., 2000 J Biol Chem PMID: 10681566), consistent with the latter possibility raised by the reviewer that DDR-2 is upstream of integrin.

      To test these hypotheses, we require an experimental condition where loss or near complete loss of INA- 1 integrin is achieved by the mid-to-late L4 larval stage, when DDR-2 is activated by collagen and taken into endocytic vesicles. Currently, we can only partially deplete INA-1 by RNAi (Figure 5—figure supplement 2E), and strong loss of function mutations in ina-1 result in early larval arrest and lethality (Baum and Garriga, 1titi7 Neuron PMID: ti247263). To overcome these obstacles, we are adapting the new FLP-ON::TIR1 system developed for precise spatiotemporal protein degradation in worms (Xiao et al., 2023 Genetics PMID: 36722258). We hope to achieve a near complete knockdown of ina-1 with this timed depletion strategy. In the future, we will use this system to block DDR-2 and integrin function specifically in the utse or seam cells, to complement our current dominant negative mis-expression approach.

      (2) In ddr-2(qy64) mutants, projections seem to form from the central portion of the utse cell. Does this reveal a second function for DDR-2, regulating perhaps the cytoskeleton?

      We thank the reviewer for their observation and agree with their interpretation. We think it is important to comment on this and have stated in the results text, lines 208-212: “In addition, membrane projections emanating from the central body of the utse were detected in ddr-2(qy64) animals. These projections were first observed at the mid L4 stage and persisted to young adulthood (Figure 2C). These observations suggest that DDR-2 functions around the mid L4 to late L4 stages to promote utse-seam attachment, and that DDR-2 may also regulate utse morphology.”

      And (3) can you use the forward genetic tools available in C. elegans to find new genes connecting DDR-2 and integrin?

      This is an excellent suggestion. We found that loss of ddr-2 strongly enhanced the uterine prolapse (Rup) defect caused by RNAi mediated depletion of integrin. To find new genes connecting DDR-2 and integrin, a targeted screen for the Rup phenotype could be performed in an integrin reduction of function condition. As we cannot work with null or strong loss-of-function ina-1 alleles (described above), the screen could be conducted with either timed depletion of INA-1 with candidate RNAi treatments, or combinatorial ina-1 RNAi with candidate RNAi treatments.

      I do see two areas where the manuscript could be improved. First, the authors rely on imprecise genetic methods to reach their conclusions (i.e. systemic RNAi, or expression of dominant negative constructs.) I think their conclusion would be stronger if they used tissue specific degradation to block ddr-2 function specifically in the utse or seam cells. Methods to do this are now regularly used in C. elegans and the authors have already developed the necessary tissue-specific promoters.

      We agree with the reviewer that tissue specific degradation of DDR-2 in the utse and seam cells will complement and strengthen our evidence for the site of action of DDR-2. As described earlier, we are currently adapting the FLP-ON::TIR1 tissue degradation system to perform these experiments and will provide our findings in a follow-up manuscript.

      Second, the manuscript is presented in the introduction as a study on formation and function of BM-BM linkage. The authors start the discussion in a similar manner. But their results are about adhesion between cells and BM. In fact they show the BM-BM linkage forms normally in ddr-2 mutants. Thus it seems like what they have really uncovered is an adhesion mechanism that works in parallel to the BM-BM linkage. Since ddr-2 appears to function equally in both utse + seam cells (based on their dominant negative data), there are likely three layers of adhesion (utse-BM, BM-BM, BM-seam) and if any of those break down, you get a partially penetrant rupture phenotype.

      The reviewer raises an important and interesting point, and we agree that we did not articulate the organization of the utse-seam tissue connection clearly. The utse-seam connection is comprised of the utse and seam BMs each ~50nm thick, and a connecting matrix bridging the two BMs, which is ~100nm thick (Vogel and Hedgecock, 2001 Development PMID: 11222143). Type IV collagen builds up to high levels within the connecting matrix and links the utse and seam BMs, and its concentration is required for DDR-2 vesiculation. An important point we did not highlight is that type IV collagen is approximately 400 nm long (Timpl et al. 1ti81, Eur J Biochem PMID: 6274634). Thus, collagen molecules within the connecting matrix could span the entire length of the utse-seam connection and project into the utse and seam BMs to interact with cell surface receptors. Consistent with this possibility, we found that buildup of type IV collagen that spans the utse-seam BM-BM linkage correlated with the timing of DDR-2 activation/vesiculation within utse and seam cells. In addition, super-resolution imaging of the mouse kidney glomerular basement membrane (GBM), a tissue connection between endothelial BM and epithelial (podocyte) BM, showed type IV collagen, which spans the BMs, projects into the endothelial and podocyte BMs (Suleiman et al., 2013 eLife PMID: 24137544 ). We carefully considered these points to generate the schematics in Figure 1A and Figure 8, but failed to articulate this point in the manuscript. We are grateful for the reviewer for bringing up our error and have now stated these details in the text to address the reviewer’s concern as outlined below.

      In the introduction (lines ti3-ti6): “A BM-BM tissue connection between the large, multinucleated uterine utse cell and epidermal seam cells stabilizes the uterus during egg laying. The utse-seam connection is formed by BMs of the utse and the seam cells, each ~50 nm thick, which are bridged by an ~100 nm connecting matrix (Vogel and Hedgecock 2001, Morrissey, Keeley et al. 2014, Gianakas, Keeley et al. 2023).”

      In the discussion (lines 507-520): “We also found that internalization of DDR-2 at the utse-seam connection correlated with the assembly of type IV collagen at the BM-BM linkage and was dependent on type IV collagen deposition. Type IV collagen is ~400 nm in length and the utse-seam connecting matrix spans ~100 nm, while the utse and seam BMs are each ~50 nm thick (Timpl, Wiedemann et al. 1ti81, Vogel and Hedgecock 2001). Thus, collagen molecules in the connecting matrix could project into the utse and seam BMs to interact with DDR-2 on cell surfaces. Consistent with this possibility, super- resolution imaging of the mouse kidney glomerular basement membrane (tiBM), a tissue connection between podocytes and endothelial cells, showed type IV collagen within the tiBM projecting into the podocyte and endothelial BMs (Suleiman, Zhang et al. 2013). As DDR-2 is activated by ligand-induced clustering of the receptor (Juskaite, Corcoran et al. 2017, Corcoran, Juskaite et al. 201ti), it suggests that the BM-BM linking type IV collagen network, which is specifically assembled at high levels, clusters and activates DDR-2 in the utse and seam cells to coordinate cell-matrix adhesion at the tissue linkage site.”

      These concerns do not undercut the significance of this work, which identifies an interesting mechanism cells use to strengthen adhesion during BM linkage formation. In fact, I am excited to read future papers detailing the connection between DDR-2 and integrin. But before undertaking those experiments the authors should be certain which cells require DDR-2 activity, and that should not be determined based solely on mis expression of a dominant negative.

      We thank the reviewer for recognizing the significance of our work and reiterate that we will use tissue-specific degradation for site of action experiments in future studies on the biology of the utse- seam tissue linkage.

      Reviewer #2 (Public Review):

      This paper explores the mechanisms by which cells in tissues use the extracellular matrix (ECM) to reinforce and establish connections. This is a mechanistic and quantitative paper that uses imaging and genetics to establish that the Type IV collagen, DDR-2/collagen receptor discoidin domain receptor 2, signaling through Ras to strengthen an adhesion between two cell types in C. elegans. This connection needs to be strong and robust to withstand the pressure of the numerous eggs that pass through the uterus. The major strengths of this paper are in crisply designed and clear genetic experiments, beautiful imaging, and well supported conclusions. I find very few weaknesses, although, perhaps the evidence that DDR-2 promotes utse-seam linkage through regulation of MMPs could be stronger. This work is impactful because it shows how cells in vivo make and strengthen a connection between tissues through ECM interactions involving collaboration between discoidin and integrin.

      We appreciate the reviewer’s assessment of the impact of our work in detailing a mechanism for how cells increase their adhesion to the ECM to establish connections between adjacent tissues. We have softened the interpretation of our MMP localization data to address the reviewer’s concern (detailed below).

      Reviewer #1 (Recommendations For The Authors):

      Regarding Figure 1D, is it possible to show when the BM forms on the cartoons more clearly (something like the 3rd section of Fig 3A)? I can see it in the timeline but it's hard to follow in the diagrams.

      We agree with the reviewer that we could show when the BM-BM connecting matrix forms more clearly in Figure 1D. Hemicentin and fibulin, the earliest components of the connecting matrix, are detected at very low levels at the utse-seam connection during the mid-L4 stage and are more prominently localized by the mid-to-late L4 stage (Gianakas et al., 2023 J Cell Biol PMID: 36282214). For this reason, we only show the connecting matrix in yellow from the mid-to-late L4 stages onward. We have now made the BM-BM connection more prominent in the figure 1D cartoons with boxed outlines (similar to Figure 3A as the reviewer suggested). We also added a label for the time window when the BM-BM connection forms.

      Regarding the RNAi induced prolapse phenotype, looking at 2B, it appears that between 5% and 10% of animals have uterine prolapse when fed control RNAi. Is this correct, it seems very high? This prolapse in control animals was not observed other RNAi experiments such as Figure 5C.

      We thank the reviewer for pointing this out. For Figure 2B, the control used was wild-type N2 animals fed with OP50 E. coli bacteria, rather than HT115 bacteria carrying the L4440 empty vector (control RNAi). This is because the main comparisons were to five ddr-1 and ddr-2 mutant strains. We did notice a slightly higher baseline uterine prolapse frequency (5% on average, detailed in Figure 2—Source data 1) in wild-type animals fed OP50 bacteria, compared to HT115 bacteria fed animals (approximately 1-2% on average). It is possible this could be linked to the nutritional differences in the two bacterial strains. However, we are confident of our data in Figure 2B as we carried out 3 independent trials, and the uterine prolapse frequencies in ddr-1 mutant animals matched the baseline in wild-type animals, while the frequencies for ddr-2 mutants were all increased over the baseline in all trials (as detailed in Figure 2—Source data 1).

      Relating to the point above, in reading the methods to try to understand how they did the RNAi, I noticed that they measure prolapse continually over five days. I didn't realize it takes a long time to occur. I think they should explain this in the text and in the figures. Reading the manuscript I thought prolapse occurred as soon as mutant animals began laying eggs. In the text they should explain this when they first assay the phenotype (page 7), and for figures the Y axis on the graphs could say "% uterine prolapse after 5 days."

      We thank the reviewer for their suggestions. We did not articulate clearly that the utse-seam connection is able to withstand some mechanical stress, even when key components are lost. It’s only over time and repeated use that the connection breaks down. This is likely because a number of components contribute to the connection and as we have shown previously, there is feedback, such that when one components is reduced, such as collagen, hemicentin is increased in levels at the BM-BM connection. Since ruptures arising from utse-seam detachments typically occur sometime after the onset

      of egg-laying, we screened the entire egg-laying period (days two to five post-L1) as described in Gianakas et al. 2023. We have now incorporated these points in the text and figures as follows:

      In the introduction, we clarified that utse-seam BM-BM connection breaksdown over time, by adding (lines titi-105): “Hemicentin promotes the recruitment of type IV collagen, which accumulates at high levels at the BM-BM tissue connection and strengthens the adhesion, allowing it to resist the strong mechanical forces of egg-laying. The utse-seam connection is robust, with each component of the tissue- spanning matrix contributing to the BM-BM connection (Gianakas, Keeley et al. 2023). This likely accounts for the ability of the utse-seam connection to initially resist mechanical forces after loss of any one of these components, delaying the uterine prolapse phenotype until sometime after the initiation of egg-laying.”

      We expanded the results text when we first describe the Rup phenotype (lines 183-184): “We first screened for the Rup phenotype caused by uterine prolapse, observing animals every day during the egg-laying period, from its onset (48 h post-L1) to end (120 h) (Methods)”.

      We provided more detail in the Methods section (lines 784-7ti0): “Uterine prolapse frequency was assessed as described previously (Gianakas et al 2023). Briefly, synchronized L1 larvae were plated (~20 animals per plate) and after 24 h, the exact number of worms on each plate was recorded. Plates were then visually screened for ruptured worms (uterine prolapse) every 24 h during egg-laying (between 48 h to 120 h post-L1). We chose to examine the entire egg-laying period as ruptures arising from utse-seam detachments do not usually occur at the onset of egg-laying, but after cycles of egg-laying that place repeated mechanical stress on the utse-seam connection (Gianakas et al 2023).”

      Finally, we modified the Y-axes of graphs in Figure 2B and 5C and the respective figure legends as suggested by the reviewer.

      Then I went back and compared to the previous publication (Gianakas, 2023). I would be interested to see a time course of how many animals prolapse after 1 day, 2 days, etc.? Is this consistent with their data on hemicentrin?

      We agree with the reviewer that a time course of uterine prolapse would be interesting as we saw ruptures occur throughout the egg-laying period. However, for the hemicentin knockdown experiments in Gianakas et al. 2023 as well as the experiments in this study, we recorded only the pooled number of animals with ruptures at the end of the experimental window. In future studies we will also record the uterine prolapse frequencies on each day to generate time courses that will provide more insight into the function of proteins at the utse-seam connection.

      Lines 183-184: I'm not sure what it means to say "trended towards displaying a significant Rup phenotype?" Since the difference was not statistically significant, it would be better to say something like "increased but not statistically significant."

      We agree with the reviewer and have now modified this sentence (lines 190-193): “Animals carrying the ddr-2(ok574) allele, which deletes a portion of the intracellular kinase domain (Unsoeld, Park et al. 2013),also showed an increased frequency of the Rup phenotype compared to wild-type animals, although this difference was not statistically significant (Figure 2A and B)”.

      Line 186: 'penetrant' needs a qualifier to indicate the magnitude of the proportion of individuals with the phenotype.

      As we provide the Rup frequency numbers in Figure 2—Source data 1, we modified the sentence as follows (lines 1ti3-1ti5): “We further generated a full-length ddr-2 deletion allele, ddr-2(qy64), and confirmed that complete loss of ddr-2 led to a significant uterine prolapse defect (Figure 2A and B).”

      Lines 206-208; could the mounting/imaging procedure (which I assume requires squeezing the worm between agarose pad and coverslip) alter the occurrence of prolapse? I would think prolapse would occur more frequently under these conditions as compared to worms laying eggs on a plate.

      The reviewer brings up an important concern. The mounting and imaging procedure does require placing the worm between an agarose pad and a coverslip. However, this did not alter the occurrence of uterine prolapse in this experiment. We were careful to perform the same procedure on both wild-type and ddr- 2(qy64) animals to control for this. As detailed in the manuscript, none of the eight wild-type animals we mounted underwent uterine prolapse after recovery off the coverslip, and among the ddr-2(qy64) mutants we mounted, only the ones that exhibited utse-seam detachments went on to rupture later.

      We articulated these points more clearly by modifying lines 214-216 as follows: “Wild-type and ddr- 2(qy64) animals were mounted and imaged at the L4 larval stage for utse-seam attachment defects, recovered, and tracked to the 72-hour adult stage, where they were examined for the Rup phenotype.”

      In seam cells you can see that DDR-2:mNG is present at membranes from early to mid L4, which makes sense. But I cannot see it on the membrane at any time point in the utse. Perhaps it is obscured by the yellow dotted line. Should it be visible on utse membranes before it is endocytosed?

      The reviewer raises an interesting question. We think it is likely that DDR-2 is initially on the membrane of the utse like it is on the seam cells. However, we have not observed this, possibly due to the complex shape and thin membrane extensions of the utse. We are unable even to detect clear membrane enrichment of membrane markers in the utse (for example, compare the utse and seam membrane markers in Figure 3B). Thus, we refrained from speculating on DDR-2 utse membrane localization in the manuscript, and instead focused on the pattern of vesicular DDR-2 peaking at the late L4 stage, which was clearly visible in both the utse and seam cells.

      Sup Fig 3A - please show quantification of seam cells not contacting utse at the same Y-axis scale as for regions that do contact utse.

      We have modified the Y-axis scale for the quantification of the seam region not contacting the utse.

      Figure 4A - I don't see a difference between WT and ok574 - what am I missing?

      In the representative ok574 animal shown, a portion of the utse arm on the top right is detached from the seam. To make this phenotype clearer, we have recropped the image panels, readjusted the brightness and contrast of the utse and the seam, and redrawn the outline of the detachment to make this clearer.

      Figure 4C+D, and lines 296-298: I'd bet that both are needed to recruit DDR-2 to membranes. But him-4 has a more severe phenotype because the RNAi knockdown is much more effective (perhaps b/c they are using the newer t444t vector).

      We agree with the reviewer that the him-4 knockdown phenotype is likely more severe than emb-9 knockdown. Type IV collagen at the utse-seam connection is very stable compared to hemicentin (Gianakas et al 2023, J Cell Biol PMID: 36282214, see Fig. 5C), which could explain the lower knockdown efficiency.

      We modified our interpretation of the data in the text as follows (lines 308-312): “In addition, we did not detect DDR-2 at the cell surface, suggesting that hemicentin has a role in recruiting DDR-2 to the site of utse-seam attachment. It is possible that collagen could also function in DDR-2 recruitment, but we could not assess this definitively due to the lower knockdown efficiency of emb-9 RNAi (Figure 4—figure supplement 1A).”

      Reviewer #2 (Recommendations For The Authors):

      Line 218 DDR-2 (typo)

      We have corrected this typo.

      Evidence (line 344-348) may not be strong enough to say whether or not DDR-2 promotes utse- seam linkage through regulation of MMPs.

      We agree with the reviewer and have softened our conclusions as follows (lines 356-363): “The C. elegans genome harbors six MMP genes, named zinc metalloproteinase 1-6 (zmp-1-6) (Altincicek, Fischer et al. 2010). We examined four available reporters of ZMP localization (ZMP-1::tiFP, ZMP-2::tiFP, ZMP-3::tiFP, and ZMP-4::tiFP) (Kelley, Chi et al. 201ti).Only ZMP-4 was detected at the utse-seam connection and its localization was not altered by knockdown of ddr-2 (Figure 5—figure supplement 1F). These observations suggest that DDR-2 does not promote utse-seam linkage through regulation of MMPs, although we cannot rule out roles for DDR-2 in promoting the expression or localization of ZMP-5 or ZMP-6.”

      The authors show the critical period is in late L4, however, is the signaling needed later too? For example, is the linkage strengthening moderated by DDR-2 important as more eggs accumulate?

      The reviewer raises an interesting question. We observed that the vesicular localization of DDR-2 sharply declined before the onset of egg-laying. By young adulthood, very few punctate structures of DDR-2 were observed in the seam cells, and none in the utse (Figure 3B). Furthermore, the frequency of utse- seam detachments in ddr-2 mutant animals peaked by the late L4 stage and did not increase after this time, suggesting DDR function is no longer required after the late L4 stage (Figure 2D). Thus, we believe that DDR-2 signaling strengthens tissue linkage only during the early formation of the utse-seam connection between the mid and late L4 stage.

      We incorporated these points in the discussion (lines 477-485): “Through analysis of genetic mutations in the C. elegans receptor tyrosine kinase (RTK) DDR-2, an ortholog to the two vertebrate DDR receptors (DDR1 & DDR2) (Unsoeld, Park et al. 2013), we discovered that loss of ddr-2 results in utse-seam detachment beginning at the mid L4 stage. The frequency of detachments in ddr-2 mutant animals peaked around the late L4 stage and did not increase after this time. This correlated with the levels of DDR-2::mNG at the utse-seam connection, which peaked at the late L4 stage and then sharply declined by adulthood. Together, these findings suggest that DDR-2 promotes utse-seam attachment in the early formation of the tissue connection between the mid and late L4 stage.”

      Fig. 3B is the fluorescence quantification normalized to the area?

      Yes, it is. We used mean fluorescence intensity for all fluorescence quantifications to normalize for the area where the signal was measured. We added a line in Methods to emphasize this (lines 73ti-740): “We measured mean fluorescence intensity for all quantifications in order to account for linescan area.”

      Fig. 4B a statistical assessment of the degree of co-localization of DDR-2::mNG and the endosomal markers might be a nice addition.

      We believe the reviewer is referring to Figure 3—figure supplement 1B. We have now added the statistical assessment of the degree of co-localization of DDR-2::mNG and the endosomal markers.

      We want to sincerely thank the two reviewers for their thoughtful comments and suggestions. The changes we have made in response to these comments have substantially improved the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This study by Park et al. describes an interesting approach to disentangle gene-environment pathways to cognitive development and psychotic-like experiences in children. They have used data from the ABCD study and have included PGS of EA and cognition, environmental exposure data, cognitive performance data and self-reported PLEs. Although the study has several strengths, including its large sample size, interesting approach and comprehensive statistical model, I have several concerns:

      • The authors have included follow-up data from the ABCD Study. However, it is not very clear from the beginning that longitudinal paths are being explored. It would be very helpful if the authors would make their (analysis) approach clearer from the introduction. Now, they describe many different things, which makes the paper more difficult to read. It would be of great help to see the proposed path model in a Figure and refer to that in the Method.

      We clarified the specific longitudinal paths explored in our study in the end of the Introduction section (line 149~160). We also added a figure of the proposed path model (Figure 1) and refer to it in the Method section (line 232~239).

      • There is quite a lot of causal language in the paper, particularly in the Discussion. My advice would be to tone this down.

      We corrected and tone-downed all causal languages used in our manuscript. Per your suggestion, we deleted statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead.

      • I feel that the limitation section is a bit brief, and can be developed further.

      We specified additional potential constraints of our study, including limited representativeness, limited periods of follow-up data, possible sample selection bias, and the use of non-randomized, observational data. These corrections can be found in line 518~538.

      • I like that the assessment of CP and self-reports PEs is of good quality. However, I was wondering which 4 items from the parent-reported CBCL were used and how did they correlate with the child-reported PEs? And how was distress taken into account in the child self-reported PEs measurement? Which PEs measures were used?

      We believe that the Reviewer #1’s comment for the correlations between PLEs derived from PQ-BC (total score and distress score PLEs) and from CBCL (parent-rated PLEs) might have been due to the fact that she/he was referring to the prior version of our manuscript submitted to a different journal. We obtained Pearson’s correlation coefficients between the PLEs (baseline year: r = 0.095~0.0989, p<0.0001; 1-year follow-up: r = 0.1322~0.1327, p<0.0001; 2-year follow-up: r = 0.1569~0.1632, p<0.0001) and added this information in the Method section for PLEs (line 198~201).

      • What was the correlation between CP and EA PGSs?

      We also added the Pearson’s correlation between the two PGSs (r =0.4331, p<0.0001) in the Methods section for PGS (line 214~215).

      • Regarding the PGS: why focus on cognitive performance and EA? It should be made clearer from the introduction that EA is not only measuring cognitive ability, but is also a (genetic) marker of social factors/inequalities. I'm guessing this is one of the reasons why the EA PGS was so much more strongly correlated with PEs than the CP PGS. See the work bij Abdellaoui and the work by Nivard.

      We thank the reviewer for the feedback to clarify that educational attainment (EA) is not only a genetic marker of cognitive ability but also that of socioeconomic outcomes. Per your suggestion, we included the associations of EA PGS with multiple biological and socioeconomic outcomes found in prior studies (e.g., Abdellaoui et al., 2022) in the Introduction (line 131~142).

      Abdellaoui, A., Dolan, C. V., Verweij, K. J. H., & Nivard, M. G. (2022). Gene–environment correlations across geographic regions affect genome-wide association studies. Nature Genetics. doi:10.1038/s41588-022-01158-0

      • Considering previous work on this topic, including analyses in the ABCD Study, I'm not surprised that the correlation was not very high. Therefore, I don't think it makes a whole of sense to adjust for the schizophrenia PGS in the sensitivity analyses, in other words, it's not really 'a more direct genetic predictor of PLEs'.

      We conducted this adjustment considering that PLEs often precede the onset of schizophrenia. In addition, prior studies found that schizophrenia PGS is significantly associated with cognitive intelligence within psychosis patients (Shafee et al., 2018) and individuals at-risk of psychosis (He et al., 2021), and that significant distress psychotic-like experiences had greater positive correlation with schizophrenia PGS than PGS for psychotic-like experiences (Karcher et al., 2018).

      For these reasons, we thought that it is necessary to assess whether the effects of cognitive phenotypes PGS (i.e., CP PGS and EA PGS) in the linear mixed model are significant after adjusting for schizophrenia PGS. We believe our results from the mixed linear model showed the sensitivity and specificity of the association between cognitive phenotype PGS and PLEs.

      He, Q., Jantac Mam-Lam-Fook, C., Chaignaud, J., Danset-Alexandre, C., Iftimovici, A., Gradels Hauguel, J., . . . Chaumette, B. (2021). Influence of polygenic risk scores for schizophrenia and resilience on the cognition of individuals at-risk for psychosis. Translational Psychiatry, 11(1). doi:10.1038/s41398-021-01624-z

      Karcher, N. R., Paul, S. E., Johnson, E. C., Hatoum, A. S., Baranger, D. A. A., Agrawal, A., . . . Bogdan, R. (2021). Psychotic-like Experiences and Polygenic Liability in the Adolescent Brain Cognitive Development Study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. doi:https://doi.org/10.1016/j.bpsc.2021.06.012

      Shafee, R., Nanda, P., Padmanabhan, J. L., Tandon, N., Alliey-Rodriguez, N., Kalapurakkel, S., . . . Robinson, E. B. (2018). Polygenic risk for schizophrenia and measured domains of cognition in individuals with psychosis and controls. Translational Psychiatry, 8(1). doi:10.1038/s41398-018-0124-8

      • How did the FDR correction for multiple testing affect the results?

      For all analysis results presented in our study, False Discovery Rate (FDR) correction for multiple testing compared p-values of nine key study variables: PGS (cognitive performance or educational attainment), family income, parental education, family’s financial adversity, Area Deprivation Index, years of residence, proportion of population below -125% of the poverty line, positive parenting behavior, and positive school environment. An exception was the sensitivity analysis that included schizophrenia PGS in the linear mixed model for adjustment: with another PGS variable added, FDR correction compared p-values of ten key variables. Overall, the effects of FDR correction on the results were limited; i.e., the majority of associations between the key variables and the outcomes, which were deemed highly significant, remained unchanged after the FDR correction.

      Overall, I feel that this paper has the potential to present some very interesting findings. However, at the moment the paper misses direction and a clear focus. It would be a great improvement if the readers would be guided through the steps and approach, as I think the authors have undertaken important work and conducted relevant analyses.

      We express our appreciation to the reviewer for the constructive feedback and guidance, which has significantly contributed to the improvement of our manuscript. As addressed in the preceding sections, we have implemented the necessary corrections and clarifications in response to the reviewer's suggestions. We remain open to making further amendments as needed, and thus invite any additional comments should any aspect of our revisions be deemed inadequate or inappropriate.

      Reviewer #2 (Public Review):

      This paper tried to assess the link between genetic and environmental factors on psychotic-like experiences, and the potential mediation through cognitive ability. This study was based on data from the ABCD cohort, including 6,602 children aged 9-10y. The authors report a mediating effect, suggesting that cognitive ability is a key mediating pathway in the link between several genetic and environmental (risk and protective) factors on psychotic-like experiences.

      While these findings could be potentially significant, a range of methodological unclarities and ambiguities make it difficult to assess the strength of evidence provided.

      Strengths of the methods:

      The authors use a wide range of validated (genetic, self- and parent-reported, as well as cognitive) measures in a large dataset with a 2-year follow-up period. The statistical methods have the potential to address key limitations of previous research.

      We sincerely thank the reviewer for recognizing these methodological strengths of our study. The reviewer’s positive comments are highly supportive and encouraging for us.

      Weaknesses of the methods:

      The rationale for the study is not completely clear. Cognitive ability is probably a more likely mediator of traits related to negative symptoms in schizophrenia, rather than positive symptoms (e.g., psychosis, psychotic-like symptom). The suggestion that cognitive ability might lead to psychotic-like symptoms in the general population needs further justification.

      We sincerely thank and highly appreciate the concerns that the reviewer has raised regarding our proposal that cognitive ability may serve as a mediator of psychotic-like experiences. To the best of our knowledge, it has been proposed that cognitive ability can be a mediator of positive symptoms in schizophrenia (including psychotic-like experiences), as well as negative symptoms. This mediating role of cognitive ability was proposed in several prior studies on cognitive model of schizophrenia/psychosis. Per your suggestion, we included further justification in the Introduction section of our study (line 104~107). Specifically, we highlighted that cognitive ability has been theoretically proposed as a potential mediator of genetic & environmental influence on positive symptoms of schizophrenia such as psychotic-like experiences. We refer to studies conducted by Howes & Murray (2014) and Garety et al. (2001).

      Howes, O. D., & Murray, R. M. (2014). Schizophrenia: an integrated sociodevelopmental-cognitive model. The Lancet, 383(9929), 1677-1687. doi:https://doi.org/10.1016/S0140-6736(13)62036-X

      Garety, P. A., Kuipers, E., Fowler, D., Freeman, D., & Bebbington, P. E. (2001). A cognitive model of the positive symptoms of psychosis. Psychological Medicine, 31(2), 189-195. doi:10.1017/S0033291701003312

      Terms are used inconsistently throughout (e.g., cognitive development, cognitive capacity, cognitive intelligence, intelligence, educational attainment...). It is overall not clear what construct exactly the authors investigated.

      Thank you for your comment. We corrected the term ‘cognitive capacity’ to ‘cognitive phenotypes’ throughout our manuscript. We also added in the Introduction (line 141~143) that we will collectively refer to these two PGSs of focus as ‘cognitive phenotypes PGSs’, which is similar to the terms used in prior research (Joo et al., 2022; Okbay et al., 2022; Selzam et al., 2019).

      Joo, Y. Y., Cha, J., Freese, J., & Hayes, M. G. (2022). Cognitive Capacity Genome-Wide Polygenic Scores Identify Individuals with Slower Cognitive Decline in Aging. Genes, 13(8), 1320. doi:10.3390/genes13081320

      Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., . . . Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54(4), 437-449. doi:10.1038/s41588-022-01016-z

      Selzam, S., Ritchie, S. J., Pingault, J.-B., Reynolds, C. A., O’Reilly, P. F., & Plomin, R. (2019). Comparing Within- and Between-Family Polygenic Score Prediction. The American Journal of Human Genetics, 105(2), 351-363. doi:https://doi.org/10.1016/j.ajhg.2019.06.006

      Not the largest or most recent GWASes were used to generate PGSes.

      Thank you for mentioning this point. The reason why we were not able to use the largest GWAS for cognitive intelligence, educational attainment and schizophrenia is because (unfortunately) our study started earlier than the point when the GWAS studies by Okbay et al. (2022) and Trubetskoy et al. (2022) were published. We corrected that our study used ‘a GWAS of European-descent individuals for educational attainment and cognitive performance’ instead of the largest GWAS (line 206~208).

      It is not fully clear how neighbourhood SES was coded (higher or lower values = risk?). The rationale, strengths, and assumptions of the applied methods are not fully clear. It is also not clear how/if variables were combined into latent factors or summed (weighted by what). It is not always clear when genetic and when self-reported ethnicity was used. Some statements might be overly optimistic (e.g., providing unbiased estimates, free even of unmeasured confounding; use of representative data).

      Consistent with the illustration of neighborhood SES in the Methods section, higher values of neighborhood SES indicate risk. In the original Figure 2, higher values of neighborhood SES links to lower intelligence (direct effects: β=-0.1121) and higher PLEs (indirect effects: β=-0.0126~ -0.0162). We think such confusion might have been caused by the difference between family SES (higher values = lower risk) neighborhood SES (higher values = higher risk). Thus, we changed the terms to ‘High Family SES’ and ‘Low Neighborhood SES’ in the corrected figure (Figure 3) for clarification.

      Considering that shorter duration of residence may be associated with instability of residency, it may indicate neighborhood adversity (i.e., higher risk). This definition of the ‘years of residence’ variable is in line with the previous study by Karcher et al. (2021).

      We represented PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs as composite indicators (derived from a weighted sum of relevant observed variables). To the best of our knowledge, it has been suggested from prior studies that these variables are less likely to share a common factor and were assessed as a composite index during analyses. For instance, Judd et al. (2020) and Martin et al. (2015) analyze genetic influence of educational attainment and ADHD as composite indicators. Also, as mentioned in Judd et al. (2020), socioenvironmental influences are often analyzed as composite indicators. Studies on psychosis continuum (e.g., van Os et al., 2009) suggest that psychotic disorders are likely to have multiple background factors instead of having a common factor, and notes that numerous prior research uses composite indices to measure psychotic symptoms. These are the reasons why we used components for these constructs instead of generating latent factors (which is done in the standard SEM method). On the contrary, we represented general intelligence as a common factor that determines the underlying covariance pattern of fluid and crystallized intelligence, based on the classical g theory of intelligence. We added this explanation in line 269~285.

      Moreover, during estimation, the IGSCA determines weights of each observed variable in such a way as to maximize the variances of all endogenous indicators and components. We added this explanation in the description about the IGSCA method (line 266~268).

      We deleted overly optimistic statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead, throughout our manuscript.

      Judd, N., Sauce, B., Wiedenhoeft, J., Tromp, J., Chaarani, B., Schliep, A., ... & Klingberg, T. (2020). Cognitive and brain development is independently influenced by socioeconomic status and polygenic scores for educational attainment. Proceedings of the National Academy of Sciences, 117(22), 12411-12418.

      Karcher, N. R., Schiffman, J., & Barch, D. M. (2021). Environmental Risk Factors and Psychotic-like Experiences in Children Aged 9–10. Journal of the American Academy of Child & Adolescent Psychiatry, 60(4), 490-500. doi:10.1016/j.jaac.2020.07.003

      Martin, J., Hamshere, M. L., Stergiakouli, E., O'Donovan, M. C., & Thapar, A. (2015). Neurocognitive abilities in the general population and composite genetic risk scores for attention‐deficit hyperactivity disorder. Journal of Child Psychology and Psychiatry, 56(6), 648-656.

      van Os, J., Linscott, R., Myin-Germeys, I., Delespaul, P., & Krabbendam, L. (2009). A systematic review and meta-analysis of the psychosis continuum: Evidence for a psychosis proneness–persistence–impairment model of psychotic disorder. Psychological Medicine, 39(2), 179-195. doi:10.1017/S0033291708003814

      It appears that citations and references are not always used correctly.

      We thoroughly checked all citations and specified the references for each statement. We deleted Plomin & von Stumm (2018) and Harden & Koellinger (2020) and cited relevant primary studies (e.g., Lee et al., 2018; Okbay et al., 2022; Abdellaoui et al., 2022) instead. We also specified the references supporting the statement that educational attainment PGS links to brain morphometry (Judd et al., 2020; Karcher et al., 2021). As Okbay et al. (2022) use PGS of cognitive intelligence (which mentions the analyses results in their supplementary materials) as well as educational attainment, we decided to continue citing this reference. These corrections can be found in line 131~141.

      Strengths of the results:

      The authors included a comprehensive array of analyses.

      We thank the reviewer for the positive comment.

      Weaknesses of the results:

      Many results, which are presented in the supplemental materials, are not referenced in the main text and are so comprehensive that it can be difficult to match tables to results. Some of the methodological questions make it challenging to assess the strength of the evidence provided in the results.

      As you rightly identified, we inadvertently failed to reference Table S2 in the main text. We have since corrected this omission in the Results section for the IGSCA (SEM) analysis (line 375). The remainder of the supplementary tables (Table S1, S3~S7) have been appropriately cited in the main manuscript. We recognize that the quantity of tables provided in the supplementary materials is substantial. However, given the comprehensiveness and complexity of our analyses, which encompass a wide array of study variables, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too significant to exclude from the supplementary materials. That said, we are open to, and would greatly welcome, any further suggestions on how to present our supplementary results in a more accessible and digestible format. We are ready and willing to implement any necessary modifications to ensure clarity and ease of comprehension. Your guidance in this matter is highly valued.

      Appraisal:

      The authors suggest that their findings provide evidence for policy reforms (e.g., targeting residential environment, family SES, parenting, and schooling). While this is probably correct, a range of methodological unclarities and ambiguities make it difficult to assess whether the current study provides evidence for that claim.

      Impact:

      The immediate impact is limited given the short follow-up period (2y), possibly concerns for selection bias and attrition in the data, and some methodological concerns.

      We added as study limitations (line 518~538) that the impact of our findings for understanding cognitive and psychiatric development during later childhood may be limited due to the relatively short follow-up period, the possibility of sample selection bias, and the problems of interpreting analyses results from an observational study as causality (despite the novel causal inference methods, designed for non-randomized, observational data, that we used).

      As responded above, we made necessary corrections and clarifications for the points suggested by the reviewer. As we are willing to make additional revisions, please feel free to give comments if you feel that our corrections are insufficient or inappropriate.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for their positive statement and the significance of our work.

      2. Point-by-point description of the revisions


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This paper contains a set of highly valuable information on the physicochemical parameteters of betain lipids - which are synthesized in microalgae and some other lower eukaryotic organisms.

      The authors, using advanced biophysical techniques - neutron diffraction and small-angle scattering (SANS) as well as molecular dynamics (MD) simulations - established key physicochemical parameters of synthetic betaine lipid DP-DGTS, and compared it with those of the DPPC phospholipid. They "show that DP-DGTS bilayers are thicker, more rigid, and mutually more repulsive than DPPC bilayers". These are important findings.

      The authors also analyzed the phylogenetic tree of the appearance and disappearance of DGTS biosynthesis enzymes, which - together with the observed "different properties and hydration response of PC and DGTS" led them to explain "the diversity of betaine lipids observed in marine organisms and for their disappearance in seed plants". The authors tentatively suggest "A physicochemical cause of betaine lipid evolutionary loss in seed plants" (Title with "?")

      We put a question mark because our work suggests that the difference of sensitivity to hydration between DGTS and PC bilayers could be an explanation for the betaine lipid disappearance in seed plants due to the dry stage of the seed. In our hands, we never managed to obtain 35S-BTA1 overexpressing plant that produce seed. However, we do not have a formal evidence for this fact. We propose to change the title into: “The possible role of lipid bilayer properties in the evolutionary disappearance of betaine lipids in seed plants.

      May major concerns with this suggestion are:

      • In thylakoid membranes (TMs) the only phospholipid, PG, plays key roles in PSII and PSI functions (Wada and Murata 2007 Photosynth Res, Hagio et al. Plant Physiol 2000, Domonkos et al. 2004 Plant Physiol; it is difficult to explain how these roles would be overtaken by betaine lipids. In fact, data of Huang et al. (https://www.sciencedirect.com/science/article/pii/S2211926418309366) indicate betaine lipids constitute the major compounds of non-plastidial membranes" and compensation mechanism operate according to which "by the increase of PG in thylakoid membranes, suggesting a transfer of P from non-plastidial membranes to chloroplasts that would maintain a stable lipid composition of thylakoid membranes".
      • Although neutron diffraction and SANS data, as well as MD simulationa might indicate important differences, the behavior of membranes (e.g. stacking interactions, overall structure and structural dynamics of TMs, protein embedding conditions / membrane thickness etc), TMs are more dominantly determined by protein-protein interactions, mainly because these membranes, contain only small areas occupied by the bilayer phase. Similar arguments hold true for the inner mitochondrial membranes (IMMs). I suggest to take into account these severe limitations when extrapolating the data and trying to reach general conclusions. In general, I suggest a more cautious interpretation of data.

      We fully agree with the reviewer’s comments. We indeed wrote in the introduction: “In algae, under phosphate starvation, a situation commonly met in the environment, betaine lipids replace phospholipids in extraplastidic membranes. Because betaine lipids are localized in these membranes [11, 12] and share a common structural fragment with the main extraplastidic phospholipid phosphatidylcholine (PC) (Figure 1A and B), it can be speculated that these two lipid classes are interchangeable, but this was never demonstrated.”

      Plastidial membranes are mainly composed of the non-phosphorus glycerolipids MGDG, DGDG and SQDG. It is well known that in phosphate starvation, in plants and algae, the main phospholipid present in thylakoid membranes, PG, is replaced by SQDG because they are both anionic and bilayer forming lipids (Hölzl G, Dörmann P. Chloroplast Lipids and Their Biosynthesis. Annu Rev Plant Biol. 2019 Apr 29;70:51-81. doi: 10.1146/annurev-arplant-050718-100202; Endo K, Kobayashi K, Wada H. Sulfoquinovosyldiacylglycerol has an Essential Role in Thermosynechococcus elongatus BP-1 Under Phosphate-Deficient Conditions. Plant Cell Physiol. 2016 Dec;57(12):2461-2471; Van Mooy BA, Rocap G, Fredricks HF, Evans CT, Devol AH. Sulfolipids dramatically decrease phosphorus demand by picocyanobacteria in oligotrophic marine environments. Proc Natl Acad Sci U S A. 2006 Jun 6;103(23):8607-12.; Kobayashi K, Fujii S, Sato M, Toyooka K, Wada H. Specific role of phosphatidylglycerol and functional overlaps with other thylakoid lipids in Arabidopsis chloroplast biogenesis. Plant Cell Rep. 2015 Apr;34(4):631-42.). We recently showed by the same kind of neutron diffraction approaches that PG and SQDG share similar physicochemical properties that can explain their conserved replacement by each other in plastidial membranes (Bolik S, Albrieux C, Schneck E, Demé B, Jouhet J. Sulfoquinovosyldiacylglycerol and phosphatidylglycerol bilayers share biophysical properties and are good mutual substitutes in photosynthetic membranes. Biochim Biophys Acta Biomembr. 2022 Dec 1;1864(12):184037. ). However, nothing is known about mitochondrial membranes and DGTS localization. Because PC is a major lipid component of mitochondria in plants and fungi and PC is absent in Chlamydomonas reinhardtii, mitochondria membranes could contain DGTS at least in Chlamydomonas.

      To clarify this statement, we added in the introduction the sentences: “Betaine lipid synthesis is located in the ER [13,14] and betaine lipids are expected to be absent in photosynthetic membranes [12]. Therefore, this PC-betaine lipid replacement is not expected to occur in photosynthetic membranes. However, it might occur at the surface of the chloroplast envelope where PC might be present [15–17]. Nothing is known about the composition of mitochondrial membranes in algae but because PC is a major lipid component in plant and fungal mitochondria, this replacement might also occur in mitochondria.” In the discussion, we replaced “cellular membrane” with “extraplastidial membrane”.

      A minor point - just to avoid possible misunderstanding: betaine can be present in large quantities in many photosynthetic organisms. A short statement on betaine would help.

      To avoid any confusion with betaine as a soluble molecule and betaine lipid, we added this sentence in the introduction: “The presence of betaine lipids is not linked to the synthesis of betaine, a soluble compound present in almost every organism including most animals, plants, and microorganisms, acting as protectant against osmotic stress [22].”

      **Referee cross-commenting**

      I agree with the evaluation of Reviewer #2 - while keeping mine

      Reviewer #1 (Significance (Required)):

      The physico-chemical properties of betaine lipids have not been established. These lipids - under P starvation of microalgae - accummulate in large quentites. Thus, their detailed characterization and comparison to (otherwise similar) phospolipids are of high importance and advance our knowledge about the roles of these lipids and the organization and structural / functional plasticity of biological membranes.

      As outlined above, I suggest a more cautious interpretation of the data and conclusions regarding e.g. the energy-converting membranes.

      I think the audience is relatively broad: (i) basic research of lipid models and (ii) methodology as well as calling the attention of membrane biologists to the scarcely studied betaine lipids.

      My field is the biophysics photosynthesis - the stability and plasticity of the oxygenic photosynthetic machinery at different levels of complexity; the and closest to this topic is the polymorphic lipid phase behavior of plant TMs.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This manuscript nicely presents the effect of phosphate depletion on how betaine lipids function as effective replacements in a water-rich environment. The mix of computational and wet lab experiments provides details on membrane structure and general effects when phospholipids are changed to betaine lipids. I found this manuscript easy to read and understand and is worthy of publishing. However, I do have a few minor comments below to improve the manuscript.

      Minor Comments:

      1. Phases in PC lipids with saturated tails: The authors present a gel to liquid crystalline phase change for DPPC at 40oC. However, this is at the ripple-liquid crystalline phase transition and the gel doesn't occur until about 34-35oC. This should be noted in the manuscript.

      We indeed completed the sentence in the first result section by : “The DSC data show a sharp phase transition at 40.2 ± 0.1°C for DPPC corresponding to the transition between the ripple phase and the fluid phase, which is consistent with earlier reports on DPPC large unilamellar vesicles [25].”

      Page 4: I am confused with the following phase: "indicating either weak cooperativity between lipid bilayers or that phase co-existence is not a thermodynamic disadvantage, while this phenomenon is not observed for DPPC bilayers." What is meant by phase co-existence is not a thermodynamic disadvantage? Could this also be due to some frustration in phase coexistence and the presence of a ripple phase that kinetically is inhibited and thus a sharp transition is not observed?

      We did not observe a ripple phase in DP-DGTS as it is defined in DPPC bilayer either by DSC, neutron diffraction or SANS experiments. We don’t know if it exists in DP-DGTS bilayers. What we observe in neutron diffraction is a coexistence of gel phase and fluid phase domains in oriented multilayer films of DP-DGTS over a wide range of humidity whereas for DPPC we observe only a gel phase or a fluid phase. Because the thicknesses of the DP-DGTS bilayers are not so different between the gel phase and the fluid phase, we suppose that the free energy difference between the two phases is very small over a wide osmotic pressure range and that could explain the broad phase transition.

      To further clarify our point, we have reworded the sentence in the following way: “As seen in Figure 2A , by increasing the humidity, DPPC molecules transit from the gel to the fluid phase via a ripple phase through a narrow window of osmotic pressures as previously reported [30,31]. In contrast, DP-DGTS bilayers show a phase coexistence that can be observed over a wide P-range and without the appearance of a third phase that could be attributed to a distinct ripple phase (Figure 2B) before forming a single fluid phase at high humidity (i.e., at low P). Based on DSC and neutron diffraction as two independent techniques, we can safely conclude that the phase transition for DP-DGTS is broad. This observation indicates that the free energy difference between the two phases is very small over a wide osmotic pressure range and may be connected to the shapes of the pressure-distance relations in the two phases, which are discussed further below.” We also added in the legend of figure 4 (SANS experiment): “No ripple phase Pb was detected for DP-DGTS bilayers.”

      DOI for computational methods: The DOI listed computational files (https://doi.org/10.18419/darus-2360) does not work.

      Unfortunately, we did not ask for publication of the URL upon submission of the manuscript and thank the reviewer for carefully checking this. Since DaRUS is a peer-reviewed repository ensuring high quality data sets according to the FAIR principle, peer review is still ongoing. The provided link will work definitely only when the manuscript will be published. In the meantime, we provide a temporary link for reviewing :

      https://darus.uni-stuttgart.de/privateurl.xhtml?token=cbfac341-0e4a-4403-8f73-87bce31ca805

      Reviewer #2 (Significance (Required)):

      This work has broad significance and would be of general interest to those in membrane biophysics to plant biology and evolution. The work nicely touches on all these topics, and I find this fills a gap in details of these betaine lipids structure and relation to evolution in terrestrial vs. marine plants.

    1. Because here’s something else that’s weird but true: in the day-to-day trenches of adult life, there is actually no such thing as atheism. There is no such thing as not worshipping. Everybody worships. The only choice we get is what to worship.

      I find this to be true because in reality, as much as some people may not believe in worshipping anything be it spiritual, supernatural, of anything of the sort, practically, people believe in things or worship things which keeps them going. For instance, one may find himself or herself in a critical situation with no certainty of how to get out of it, but he or she may wish to get out of that situation without thinking of anyone in mind but just believes and it their wish comes to pass, the act of wishing alone is a prayer made. Just like Wallace mention, atheism does not exist as people find themselves worshipping various things since in reality, whatever we humans dedicate our time to so much to the extent we believe we cannot do without (worthy) is actually a form of worship. This includes, money, spending much time with tv, social media, and the likes. Hence, we give reverence to these things which makes us prisoners in our own selves. Therefore, as humans, we should learn to be conscious about what is real and important, so we can control how we think and make choices.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports new findings about the role of the glutamate transporter EAAC1 in controlling neural activity in the striatum. The significance is two-fold - it addresses gaps in knowledge about the functional significance of EAAC1, as well as provides a potential explanation for how EAAC1 mutations contribute to striatal hyperexcitability and OCD-associated behaviors. The manuscript is clearly presented, and the well-designed experiments are rigorously performed and analyzed. The main results showing that EAAC1 deletion increases the dendritic arbor of MSN D1 neurons and increases excitatory synaptic connectivity, as well as reduces D1-to-D1 mediated IPSCs are convincing. These results clearly demonstrate that EAAC1 deletion can alter excitatory and inhibitory synaptic function. Modelling the potential consequences for these changes on D1 MSN neural activity, and the behavior changes are interesting. Minor weaknesses include incomplete support for the conclusions about how EAAC1 regulates GABAergic transmission.

      We would like to take this opportunity to thank the reviewer. New sets of pharmacology experiments now address the minor concern about supporting the conclusions about the regulation of GABAergic transmission by EAAC1. The revised manuscript also includes new behavioral assays that allow us to examine in more depth the cell- and region-specificity of the effects of EAAC1.

      Reviewer #2 (Public Review):

      The manuscript by Petroccione et al., examines the modulatory role of the neuronal glutamate transporter EAAC1 on glutamatergic and GABAergic synaptic strength at D1- and D2-containing medium spiny neurons within the dorsolateral striatum. They find that pharmacological and genetic disruption of EAAC1 function increases glutamatergic synaptic strength specifically at D1-MSNs. They show that this is due to a structural change in release sites, not release probability. They also show that EAAC1 is critical in maintaining lateral inhibition specifically between D1-MSNs. Taken together, the authors conclude that EAAC1 functions to constrain D1-MSN excitation. Using a computational modeling technique, they posit that EAAC1's modulatory role at glutamatergic and GABAergic inputs onto D1-MSNs ultimately manifests as a reduction of gain of the input-output firing relationship and increases the offset. They go on to show that EAAC1 deletion leads to enhanced switching behavior in a probabilistic operant task. They speculate that this is due to a dysregulated E/I balance at D1-MSNs in the DLS. Overall, this is a very interesting study focused on an understudied glutamate transporter. Generally, the study is done in a very thorough and methodical manner and the manuscript is well written.

      We thank the reviewer for the thorough analysis and insightful comments on the manuscript. Our point-to-point responses to the concerns raised on the initial submission of this work are reported below:

      Major Comments/Concerns:

      Regional/Local manipulations in behavior study: The manuscript would be greatly improved if they provided data linking the ex vivo electrophysiological findings within the DLS with the behavior. Although they are using a DLS-dependent task, they are nonetheless, using a constitutive EAAC1 KO mouse. Thus, they cannot make a strong conclusion that the behavioral deficits are due to the EAAC1 dysfunction in the DLS (despite the strong expression levels in the DLS).

      Corrected - We concur with the reviewer. To address this concern, we performed new experiments to assess the cell- and regional-specificity of the effects of EAAC1 on task-switching behaviors.

      First, we repeated the behavioral assays described in Fig. 8 in two mouse lines (D1Cre/+:EAAC1f/f and A2ACre/+:EAAC1f/f) lacking EAAC1 expression in D1- or D2-MSNs, respectively (Supp. Fig. 8-1). As in the case of EAAC1+/+ and EAAC1-/- mice, when the switch time was short (<15 s), D1Cre/+:EAAC1f/f and A2ACre/+:EAAC1f/f mice collected a similar number of rewards (Supp. Fig. 8-1K, L) and performed a similar number of lever presses (Supp. Fig. 8-1M, N). As the switch time increased (30-75 s), D1Cre/+:EAAC1f/f mice collected more rewards than A2ACre/+:EAAC1f/f mice, at low and high reward probabilities (Supp. Fig. 8-1L, N). Overall, the task switching behavior of D1Cre/+:EAAC1f/f mice was similar to that of EAAC1-/- mice, whereas that of A2ACre/+:EAAC1f/f mice was similar to that of EAAC1+/+ mice (cf. Supp. Fig. 8 and Supp. Fig. 8-1). This suggests that loss of expression of EAAC1 from D1-MSNs is sufficient to reproduce the task switching behavior of EAAC1-/- mice. Because EAAC1 limits excitation onto D1-MSNs (Fig. 2, 3) and lateral inhibition between D1-MSNs (Fig. 4-6), these findings suggest that increased excitation onto D1-MSNs and reciprocal inhibition among D1-MSNs limit execution of reward-based behaviors with task-switching intervals >30s.

      Second, as noted by the reviewer, another potential limitation of the experiments performed on constitutive EAAC1-/- mice is that , on their own, they do not allow us to say whether they are due to changes in E/I onto D1MSNs within a specific domain of the striatum like the DLS. Although the DLS is recruited during task-switching, reward-based flexibility in executive control relies on neuronal activity in the VMS (Wallis 2007; Gu et al. 2008). Therefore, we asked whether limiting excitation in D1-MSNs and strengthening D1-D1 lateral inhibition via EAAC1 in the VMS could also alter reward-based task-switching behaviors. To address this question, we repeated the task switching test in EAAC1f/f mice that received stereotaxic injections of a Cre-dependent viral construct (AAV-D1Cre) that we used to remove EAAC1 expression from D1-MSNs in the DLS or VMS, respectively (Supp. Fig. 8-2). The results showed that the task switching behaviors of EAAC1f/f mice receiving AAV-D1Cre injections in the DLS or VMS were similar to each other and to those of EAAC1-/- mice, while being statistically different from those of EAAC1+/+ mice. This finding is important, as it suggests that: (i) the DLS and VMS are both recruited for the execution of task switching behaviors; (ii) the modulation of E/I onto D1-MSNs by EAAC1 may not be limited to the DLS but could extend to the VMS.

      Third, we performed further tests to examine the regional-specificity of the effects of EAAC1 in D1-MSNs. D1 receptor expressing cells are present not only throughout the striatum, but also in the substantia nigra (pars compacta and reticulata; SN) and ventral tegmental area (VTA) (Cadet et al. 2010; Savasta, Dubois, and Scatton 1986; Boyson, McGonigle, and Molinoff 1986; Wamsley et al. 1989). To determine whether lack of EAAC1 in D1expressing cells in the SN/VTA could also contribute to increased compulsivity, we repeated the task switching behavioral assays in EAAC1f/f mice that received injections of AAV-D1Cre in the SN/VTA (Supp Fig. 8-3). The task switching behavior of these mice was similar to that of EAAC1+/+ , not EAAC1-/- mice, suggesting that altering EAAC1 expression in D1-MSNS of the DLS/VMS, but not the SN/VTA, is implicated with the control of task switching of reward-based behaviors in mice.

      The results of these new sets of experiments are included in the revised version of the manuscript and their implications are reported in the Discussion section of the paper.

      Statistics used in the study: There are some missing details regarding the precise stats using for the different comparisons. I am particularly concerned that the electrophysiology studies that were a priori designed as a 2-factor analysis did not have 2-way ANOVAs performed, but rather a series of t-tests. For example, in Figure 3b, the two factors are 1) cell type and 2) genotype. Was a 2-way ANOVA performed? It is hard for me to tell from the text.

      Corrected - We apologize for any potential confusion. The statistical analysis for the experiments included in this work includes paired and unpaired t-tests, one-way ANOVA, two-way ANOVA, and ANOVA for repeated measures tests followed by post hoc t-test comparisons (reported in the text). To ensure both accuracy and readability of the manuscript, we report the results of the statistical comparisons in the main text of the manuscript, but also provide a fully detailed statistical analysis across all datasets performed in the data repository for this manuscript deposited on Open Science Framework. We revised the methods section to clarify the use of different statistical tests and values reported in the manuscript.

      Moderate Concerns:

      Control mice: I am moderately concerned that littermates were not used for controls for the EAAC1 KO, but rather C57Bl/6NJ presumably ordered from a vendor. It has been shown that issues like transit and rearing conditions can have long term effects on behavior. Were the control mice reared in house? How long was the acclimation time before use?

      Corrected - Sorry for the potential confusion. The EAAC1-/- mice are bred in house and have been backcrossed with C57BL/6J for more than 10 generations. We perform backcrossing regularly and routinely in our animal colony. The C57BL/6J are also bread in house. They are replaced every 10 generations to avoid genetic drift. Therefore, there is no concern about transit from vendors and rearing affecting the results of our experiments. This information has been added to the Methods section of the paper.

      OCD framework: I generally find the OCD framework unnecessary, particularly in the Introduction. Compulsive behaviors are not restricted to OCD. Indeed, the link between the behavioral observations and OCD phenotype seems a bit tenuous. In addition, studying the mechanisms of behavioral flexibility in and of itself is interesting. I do not think such a strong link needs to be made to OCD throughout the entirety of the paper. The authors should consider tempering this language or restricting it to the discussion and end of the abstract.

      Corrected - We concur with the reviewer and have revised the manuscript accordingly. At the end of the Abstract, we refer only to behavior flexibility. We have toned down our emphasis on OCD in the Introduction, broadening the genetic link between the gene encoding EAAC1 (SLC1A1) and neuropsychiatric diseases like OCD, ADHD and ASD. This is now limited to a single sentence. We also revised the Discussion section because we agree with the reviewer on the fact that compulsive behaviors are not limited to OCD.

    1. Author Response

      Reviewer #2 (Public Review):

      1) The authors in reality do not analyze oscillations themselves in this manuscript but only the power of signals filtered at determined frequency bands. This is particularly misleading when the authors talk about "spindles". Spindles are classically defined as a thalamico-cortical phenomenon, not recorded from hippocampus LFPs. Thus, the fact that you filter the signal in the same frequency range matching cortical spindles does not mean you are analyzing spindles. The terminology, therefore, is misleading. I would recommend the authors to change spindles to "beta", which at least has been reported in the hippocampus, although in very particular behavioral circumstances. However, one must note that the presence of power in such bands does not guarantee one is recording from these oscillations. For example, the "fast gamma" band might be related to what is defined as fast gamma nested in theta, but it might also be related to ripples in sleep recordings. The increase of "spindle" power in sleep here is probably related to 1/f components arising from the large irregular activity of slow wave sleep local field potentials. The authors should avoid these conceptual confusions in the manuscript, or show that these band power time courses are in fact matching the oscillations they refer to (for example, their spindle band is in fact reflecting increased spindle occurrence).

      We thank the reviewer for allowing us to clarify this subject. We completely agree with concerns raised in the comments. To avoid any confusion, we have replaced throughout the manuscript the word ‘spindle’ with ‘beta’.

      2) The shuffling procedure to control for the occupancy difference between awake and sleep does not seem to be sufficient. From what I understand, this shuffling is not controlling for the autocorrelation of each band which would be the main source of bias to be accounted for in this instance. Thus, time shifts for each band would be more appropriate. Further, the controls for trial durations should be created using consecutive windows. If you randomly sample sleep bins from distant time points you are not effectively controlling for the difference in duration between trial types. Finally, it is not clear from the text if the UMAP is recomputed for each duration-matched control. This would be a rigorous control as it would remove the potential bias arising from the unbalance between awake and sleep data points, which could bias the subspace to be more detailed for the LFP sleep features. It is very likely the results will hold after these controls, given it is not surprising that sleep is a more diverse state than awake, but it would be good practice to have more rigorous controls to formalize these conclusions.

      We are grateful to the reviewer for suggesting alternative analysis. We have used this direction, to create surrogate datasets obtained by time shifting each band and obtained their respective UMAP projections (see modified Figure 2D). Additionally, as suggested, for duration-matched controls, we have selected consecutive windows, rather than random points (Figure 2 – figure supplement 1C). UMAP projections were obtained for each duration-matched control and occupancy was computed. The text in the method section has been modified to indicate the analysis. As expected, the results were identical.

      3) Lots of the observations made from the state space approach presented in this manuscript lack any physiological interpretation. For example, Figure 4F suggests a shift in the state space from Sleep1 to Sleep2. The authors comment there is a change in density but they do not make an effort to explain what the change means in terms of brain dynamics. It seems that the spectral patterns are shifting away from the Delta X Spindle region (concluding this by looking at Fig4B) which could be potentially interesting if analyzed in depth. What is the state space revealing about the brain here? It would be important to interpret the changes revealed by this method otherwise what are we learning about the brain from these analyses? This is similar to the results presented in Figure 5, which are merely descriptions of what is seen in the correlation matrix space. It seems potentially interesting that non-REM seems to be split into two clusters in the UMAP space. What does it mean for REM that delta band power in pyramidal and lm layers is anti-correlated to the power within the mid to fast gamma range? What do the transition probabilities shown in Figures 6B and C suggest about hippocampal functioning? The authors just state there are "changes" but they don't characterize these systematically in terms of biology. Overall, the abstract multivariate representation of the neural data shown here could potentially reveal novel dynamics across the awake-sleep cycle, but in the current form of this manuscript, the observations never leave the abstract level.

      We thank the reviewer for allowing us to clarify this aspect of the manuscript. We have now edited the main text to include considerations on the biological relevance of the findings of Figure 4, 5 and 6.

      Additions to figure 4: In particular, non-REM states in sleep2 tended to concentrate in a region of increased power in the delta and beta bands, which could be the results of increased interactions with cortical activity modulated in the same range. It is also likely that such effect was induced by the exposure to relevant behavioral experience. In fact, changes in density of individual oscillations after learning have been reported using traditional analytical methods and are thought to support memory consolidation (Bakker et al., 2015; Eschenko et al., 2008, 2006). Nevertheless, while traditional methods provide information about individual components, the novel approach used here provides additional information about the combinatorial shift in the dynamics of network oscillations after learning or exploration. Thus, it provides the basis for identifying how coordinated activity among different oscillations supports memory consolidation processes, as those occurring during non-REM sleep after exploration, which cannot be elucidated using traditional analytical methods.

      Additions to figure 5: Gamma segregation and delta decoupling offer a picture of hippocampal REM sleep as being more akin to awake locomotion (with the major difference of a stronger medium gamma presence) while also suggesting a substantial independence from cortical slow oscillations. On the other hand, the across-scale coherence of non-REM sleep is consistent with this sleep stage being dominated by brain-wide collective fluctuations engaging oscillations at every range. Distinct cross frequency coupling among various individual pairs of oscillations such as theta-gamma, delta-gamma etc., have been already reported (Bandarabadi et al., 2019; Clemens et al., 2009; Hammer et al., 2021; Scheffzük et al., 2011). However, computing cross frequency coupling on the state space provides the additional information on how multiple oscillations, obtained from distinct CA1 hippocampal layers (stratum pyramidale, stratum radiatum and stratum lacunosum moleculare), are coupled with each other during distinct states of sleep and wakefulness. Furthermore, projecting the correlation matrices on 2D plane, provides a compact tool that allows to visualize the cross-frequency interactions among various hippocampal oscillations. Altogether, this approach reveals the complex nature of coupling dynamics occurring in hippocampus during distinct behavioral states

      Additions to Figure 6: We found that transitions occurring from REM-to-REM sleep and non-REM-to-non-REM sleep (intra-state transitions) are more vulnerable to plasticity after exploration as compared to inter-state transitions (such as non-REM to REM, REM-to-intermediate etc.) (Fig 6E, F). These changes in intra-state transitions were observed to be beyond randomness (Fig S9 E, F) indicating a specificity in plastic changes in state transitions after exploration. In particular, while the average REM period duration is unaltered after exploration (Fig 4G), REM temporal structure is reorganized. In fact, increased probability of REM to REM transitions indicates a significant prolongation of REM bout duration. Similarly, the increase in non-REM to non-REM transition probability reflects an increased duration of non-REM bouts. Therefore, environment exploration was accompanied by an increased separation between REM and non-REM periods, possibly as a response to increased computational demands. More in general, the network state space allows to characterize the state transitions in hippocampus and how they are affected by novel experience or learning. By observing the state transition patterns, this analytical framework allows to detect and identify state-specific changes in the hippocampal oscillatory dynamics, beyond the possibilities offered by more traditional univariate and bivariate methods. We next investigated how fast the network flows on the state space and assessed whether the speed is uniform, or it exhibits specific region-dependent characteristics.

      Reviewer #3 (Public Review):

      1) My primary concern is to provide clear evidence that this approach will provide key insights of high physiological significance, especially for readers who may think the traditional approaches are advantageous (for example due to their simplicity). I think the authors' findings of distinct sleep state signatures or altered organization of the NLG3-KO mouse could serve this purpose. However, right now the physiological significance of these results is unclear. For example, do these sleep state signatures predict later behavior performance, or is altered organization related to other functional impairments in the disease model? Do neurons with distinct sleep state signatures form distinct ensembles and code for related information?

      We are thankful to the reviewer for raising a very interesting line of questioning regarding sleep signatures and distinct ensemble. In this study, we show that sleep state signatures can predict how individual cells may participate in information processing during open field exploration. However, further analysis exploring the recruitment of neuronal ensembles are in preparation for another manuscript and is beyond the scope of this article.

      We have further modified the description of the results (as also suggested by other reviewers) to highlight the key advantages of this approach over traditional methods.

      Regarding functional impairment: as described in the manuscript, the altered organization in animal model of autism could possibly due to alterations in cellular and synaptic mechanisms as those described in previous reports (Modi et al 2019, Foldy et al 2013)

      2) For cells with different mean firing rates during exploration: is that because they are putative fast-spiking interneurons and pyramidal cells? From the reported mean firing rates, I think some of these cells are interneurons. Since mean firing rates are well known to vary with cell type, this should be addressed. For example, the sleep state signatures may be distinct for different putative pyramidal cells and interneurons. This would be somewhat expected considering prior work that has shown different cell types have different oscillatory coupling characteristics. I think it would be more interesting to determine if pyramidal cells had distinct sleep state signatures and, if so, whether pyramidal cells from the same sleep state signature have similar properties like they code for similar things or commonly fire together in an ensemble ms the number of cells in Fig. 8 may be limited for this analysis. The authors could use the hc-11 data in addition, which was also tested in this work.

      We thank the reviewer for suggesting this additional analysis to better describe the data. To this end, we have added an additional Figure in supplementary data (analysis of hc11 dataset: Figure Figure 8 – figure supplement 3), to demonstrate that interneurons and pyramidal cells have distinct sleep signatures. These findings are in agreement with dataset presented in Figure 8D, E.

      As shown in the manuscript, the spatial firing (sparsity) has large variability for cells having similar network signatures (Fig 8E). Thus, additional parameters beside oscillations may be involved in cells encoding. Different network state spaces are required to be explored in future studies to further understand this phenomenon in detail.

      We agree that investigating neuronal ensembles and state space are an interesting direction to follow. In another study (in preparation) which are investigating in detail the recruitment of neuronal ensemble by oscillatory state space. Thus, those findings are beyond the scope of this introductory article.

      3) Example traces are needed to show how LFPs change over the state-space. Example traces should be included for key parts of the state-space in Figures 2 and 3.

      We thank the reviewer for this key insight on data representation. Example traces of how LFP varies on the state space have been added (see Figure 4 – figure supplement 1).

      4) What is the primary rationale for 200ms time bins? Is this time scale sufficient to capture the slow dynamics of delta rhythm (1-5Hz) with a maximum of 1s duration?

      Time scale of binning depends on the scale of investigation. We also replicated the results with different time bins (such as 50 ms and 1 seconds) and the results are identical. For delta rhythms, with 200 ms time bins, the dynamics will be captured across multiple bins. Additionally, the binned power time series are also smoothed before obtaining projections.

      5) Since oscillatory frequency and power are highly associated with running speed, how does speed vary over the state space. Is the relationship between speed and state-space similar to the results of previous studies for theta (Slawinska and Kasicki, Brain Res 1998; Maurer et al, Hippocampus 2005) and gamma oscillations (Ahmed and Mehta J. Neurosci 2012; Kemere et al PLOS ONE 2013), or does it provide novel insights?

      We thank the reviewer for highlighting this crucial link between oscillation and locomotion. While various articles have focused on individual oscillations, the combinatorial effects of multiple oscillations from multiple brain areas in regulating the speed of the animal during exploration is definitely worth exploring with this novel approach. These set of results will be introduced in another study, currently in preparation.

      6) The separation of 9 states (Fig. 6ABC) seems arbitrary, where state 1 (bin 1) is never visited. I suggest plotting the density distribution of the data in Fig. 2A or Fig. 6A to better determine how many states are there within the state space. For example, five peaks in such a density plot might suggest five states. Alternately, clustering methods could be useful to determine how the number of states.

      We thank the reviewer for this this useful suggestion. We agree that additional clustering methods can be used to identify non-canonical sleep states. These are currently being explored in our lab and will be part of future studies. As for this dataset, the density plots are available in figure 4E, which determines how many states are in each part of the state space.

      7) The results in Fig. 4G are very interesting and suggest more variation of sub-states during non REM periods in sleep1 than in sleep2. What might explain this difference? Was it associated with more frequent ripple events occurring in sleep2?

      The reviewer is right in looking for the source of the decreased of state variability in sleep2. Considering the distribution of relative frequency power in the state space, the higher concentration in sleep 2 corresponds to higher content in the slower delta and spindle frequency bands, rather than the higher frequencies of SWRs. This result can be interpreted in the light of enhanced cortical activity (which is known to heavily recruit those bands) and possibly of enhanced cortical-hippocampal communication following relevant behavioral experience. In fact, it is also necessary to mention that with our recording setup we cannot rule out the effects of volume conductance completely, and thus we cannot exclude that the increase in the delta and spindle bands in the hippocampus were a spurious effect of purely cortical frequency modulations.

      8) The state transition results in Fig. 6 are confusing because they include two fundamentally different timescales: fast transitions between oscillatory states and slow dynamics of sleep states. I recommend clarifying the description in the results and the figure caption. Furthermore, how can an animal transition between the same sleep state (Fig. 6EF)? Would they both be in a single sleep state?

      The transitions capture the fast oscillatory scales (as they are investigated over a timeframe of 1 second). The sleep stages (REM, non-REM etc.) are used as labels from which the states originate on the state space. This allows us to characterize fast oscillatory dynamics in various sleep stages.

      Regarding same state transition: An increase in same state transition probability corresponds to increase in prolongation of that particular state, thereby altering the temporal structure of a given sleep state.

    1. When we don’t think certain messages meet our needs, stimuli that would normally get our attention may be completely lost. Imagine you are in the grocery store and you hear someone say your name. You turn around, only to hear that person say, “Finally! I said your name three times. I thought you forgot who I was!” A few seconds before, when you were focused on figuring out which kind of orange juice to get, you were attending to the various pulp options to the point that you tuned other stimuli out, even something as familiar as the sound of someone calling your name.

      This happens with my boyfriend and I all of the time. He will be playing a video game or on his phone, and when I try to get his attention, this happens. I also thought it was because he was tuning me out on purpose or something. I also heard that humans are not meant to focus their attention on multiple things at once, so this makes sense. I think that this concept is super interesting and now I know why people do this.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity____:

      Summary: the paper suggested a new approach to study in vivo possible interaction between glioblastoma cells and glioblastoma associated macrophages. By using single cells transcriptome profiling and in vitro and in vivo functional experiments the authors also suggested LGALS1 as possible key factor in the suppression of the immune system and a new target for immune modulation in glioma patients. The experimental plan is well described, and the results are beautifully presented using images, clear drawings, and videos.

      Major comments: none

      Minor comments:

      • The number of zebrafish embryos analyzed after the xenograft is highly variable (e.g. 3-18; 4-22 in Figure 6). These numbers can be reported in the results section (not only in the legends) and the authors may comment on them in the discussion. The reproducibility of thexenotransplant experiments is always challenging as it is quite difficult to inject the same number of cells in every embryo and to have the same survival rate of injected cells and of transplanted embryos. For these reasons the volume of each xenograft can vary significantly in different embryos and in different experimental session. Accordingly, the number of macrophages associated to the tumor can vary and the statistical analysis can be deeply influenced by the number of replicates for each experimental group (a group with 3 embryos is very different in term of quality and quantity of information in respect to a group of 18 embryos). It could be useful for the reader, who has no experience in this technique, be aware of the advantages and disadvantages of the procedure including the possible influence of the temperature (34°C instead of 37°C) on the embryo survival and the replication rate of glioma cells or macrophages behavior. Comment on these aspects does not weaken the power and the relevance of the model but unveil the critical aspects that every scientist has to evaluate before planning these kinds of experiments.

      __Response: __We agree with Reviewer #1 that the zebrafish avatar model is challenging, and it is difficult to obtain reproducible tumor sizes and survival rates. To be even more transparent about this, we have added a few sentences about the variable n number in the Results section and a critical comment about it in the Discussion section.

      • An aspect that could be interesting to address, to further validate the avatar model, is to monitor the level of pro-inflammatory cytokines (Tumor Necrosis Factor and Interleukin 1, 6, and 8) that are expressed at basal level in the early developing zebrafish embryos. Do their expression level increase after the xenotransplantation? Can the zebrafish cytokines affect the behavior of glioma associated macrophages (i.e. macrophages polarization)?

      __Response: __This is an interesting point, indeed. We have injected murine melanoma (B16) cells into Tg(mpeg1:mCherry-F); Tg(TNFa:eGFP-F) embryos, a TNFa reporter line. Some (but not all) macrophages expressed TNFa and their expression decreased over time, which is consistent with previous reports (Póvoa et al, 2021). We further observed that TNFa-expressing macrophages mostly had a round, “tumor-attacking” phenotype. This is in line with our hypothesis that the tumor induces a phenotype switch in GAMs. Of note, we did not see TNFa expression in the rest of the brain tissue. We would be happy to add this data if deemed useful.

      We did not investigate other cytokines in the developing zebrafish, but we believe this is not essential for the following reasons: We are mainly interested in the differences between the patient-derived GBM stem cell cultures (GSSCs), and since they are all used in the same avatar model, we expect that if zebrafish cytokines would have an effect on GAMs and their polarization, this effect would be consistent in all avatars, and can thus be ignored when comparing different GSCCs. More importantly, our findings in the zebrafish avatar model were consistent with those in the in vitro model. We observed the same phenotype switch in the co-culture model, indicating that the key interaction is between tumor cells and macrophages.

      Significance____:

      Strengths and limitation. The manuscript is the result of a well-orchestrated effort to dissect a biological problem by complementary approaches and provide new data with high impact translational value. The image processing pipeline developed by the authors is a step forward in the in vivo analysis of cells interaction in living embryos. The identification of LGALS1 as a potential target for immune modulation can support the development of new therapeutical strategy implementing chemo- or immunotherapy protocols. The described zebrafish avatar can represent a new tool for personalized drug testing recapitulating in a in vivo model the heterogeneity of GBM found in patients.

      Audience: All the scientist interested in cell biology, cancer cell biology, imaging techniques, translational medicine, in vivo models for cancer research, precision medicine.

      Reviewer expertise: applied developmental biology

      Reviewer #2

      Evidence, reproducibility and clarity____:

      Finotto et al aim to address the polarisation of macrophages within GBM in their study. To do this, they have developed two different models. The first model is an in-vitro co-culture model of patient derived GSC lines and human monocyte derived macrophages. This model was used for single cell sequencing to understand the transcriptomic changes of macrophages upon contact to GBM cells. The second model is a zebrafish xenograft model. Here GFP labeled GBM cells were transplanted into the larval zebrafish ventricle. These experiments were done in the transgenic mpeg zebrafish which allowed to monitor responses of macrophages in vivo.

      In my opinion both models are not sophisticated enough to draw solid conclusions on macrophage polarisation in GBM. The in vitro model is highly artificial and is far from the complex situation in GBM. Within GBM the GAM population represents a heterogenous mix of resident microglia and infiltrating macrophages. These are influenced by the heterogeneous environment (which consists of tumour cells but also other host cells) and show diverse transcriptomic adaptations as shown in rodent models as well as sequencing studies of patient derived tumour samples. Studying monocyte derived macrophages in vitro does not provide any reliable insight.

      Response: We understand the reviewer’s concern about the complexity of our in vitro model. However, these simple models are needed to gain more insight into the complex in vivo situation. Others have demonstrated their usefulness in the past (C. Jayakrishnan et al, 2019; Zhou et al, 2022; Hubert et al, 2016; Chen et al, 2020; Coniglio et al, 2016; Li et al, 2022). Moreover, it may be advantageous to look at only two different cell types and unravel their reciprocal interaction, without the influence of other cell types, making it too complex to draw conclusions. We acknowledge that GAMs are a heterogeneous mix of both microglia and bone marrow-derived macrophages. Considering that bone marrow-derived macrophages have been shown to play an important role in tumor progression and are by far the most abundant immune cell population in GBM tumors (which even increases in recurrent GBM) (Pombo Antunes et al, 2021; Abdelfattah et al, 2022), we chose to focus initially on bone marrow-derived macrophages. Notably, it has already been reported that microglia were associated with significantly better survival, suggesting that they are anti-tumorigenic, whereas macrophages were associated with worse survival, suggesting that they are pro-tumorigenic (Pombo Antunes et al, 2021; Abdelfattah et al, 2022). This justifies our approach to focus on this cell type. Furthermore, although this model may be rather simplistic, it allowed us to screen different GSCCs side by side in a standardized way, through which we found an apparent phenotype switch within the macrophages, even without the complex interplay with other cell types. Because the results obtained using the in vitro model were also confirmed in GBM patient material and KO experiments in the zebrafish avatar model, our work shows that reliable and important insights can be derived. This, combined with its simplicity, makes our co-culture model an exceptionally relevant model that is scalable, screenable and allows us to study the effect of perturbations. Finally, the immunosuppressive role of the target we identified using this model, LGALS1, has been previously demonstrated by others (Verschuere et al, 2014; Van Woensel et al, 2017; Chen et al, 2019), which proves our approach is valid.

      Although the zebrafish can be a great model to understand the progression of tumours and the role of immune cells, I don't think that the model developed by the authors is suitable to address their questions. Transplantation of GBM cells into the the ventricle of larval zebrafish doesn't seem to be the right approach here. The poor survival of the transplanted cells is a clear indication of that. Many other groups have reported growth and proliferation of human cancer cells in the larval zebrafish. Direct transplantation into the brain parenchyma would be the better approach here. The brain parenchyma would provide the right environment for the GBM cells including a resident microglial population. This would also allow to study the complex mix of microglia and infiltrating macrophages in the context of GBM.

      Response: The reviewer does not specify which articles have reported growth and proliferation of human cancer cells in zebrafish larvae. Most research groups reporting this, did not follow tumor growth/proliferation over time or used immortalized cell lines (Vargas-Patron et al, 2019; Pan et al, 2020; Pudelko et al, 2018; Breznik et al, 2017; Vittori et al, 2017; Hamilton et al, 2016), which obviously have a much higher proliferation rate than the patient-derived cell lines used in this work. Second, although the number of patient-derived tumor cells decreases over time, we observed a clear invasive and migratory behavior, indicating that the human tumor cells reside well in the zebrafish microenvironment. Furthermore, it is important to note that the zebrafish avatars are grown at 34°C, a temperature that is suboptimal for tumor cell growth. The tumor cells still proliferate, albeit at a lower rate than at 37°C.

      To our knowledge, there is only one publication that reports the growth of patient-derived GBM tumors over time (Almstedt et al, 2022). However, here, zebrafish embryos were grown at 33°C. Also, prior to injection, patient-derived GBM cells were resuspended in medium containing polyvinylpyrrolidone, a polymer that enhances extracellular matrix deposition and cell proliferation. Furthermore, the authors observed substantial differences in proliferative capacity, ranging from growth to decline of signal, and represented only two patient-derived cell lines with growing tumors. Similar to our findings, another article has demonstrated that injected patient-derived GBM tumor cells progressively underwent mitotic arrest, while maintaining an invasive and aggressive growth pattern (Rampazzo et al, 2013).

      Although the tumor cells are injected into the hindbrain ventricle, they end up in the brain parenchyma, as evidenced by the presence of the typical brain vasculature of the zebrafish embryo. Notably, in Tg(mpeg1:mCherryF)ump2 zebrafish embryos, both macrophages and microglia are labeled with mCherry, meaning that we have studied both cell types in our zebrafish avatar model. Therefore, we consider the reviewer’s comment to be unfounded.

      Reviewer #3

      __ Evidence, reproducibility and clarity: __

      In this study, Finotto and colleagues developed patient-derived Glioblastoma (GBM) stem cell cultures from 7 patients. These GBM stem cell cultures were either co-cultured in vitro with human macrophages combined with single-cell RNA sequencing or injected into the orthotopic zebrafish xenograft to study live GBM-macrophage/microglia interactions. Authors aimed at studying tumor heterogeneity and GBM-associated macrophages (GAMs) which often exhibit immunosuppressive features that promote tumor progression. Their analyses revealed substantial heterogeneity across GBM patients in GBM-induced macrophages polarization and the ability to attract and activate GAMs - features that correlated with patient survival. Also authors show 3 distinct macrophage subclusters (MC1-3), highlighting that the simple M1/M2 polarization phenotypes is too reductive and there are no clear "markers". Authors associate these profiles with morphology and macrophage behaviour. Differential gene expression analysis, immunohistochemistry on original tumor samples, and knock-out experiments in zebrafish subsequently identified / confirmed that LGALS1 as a primary regulator of immunosuppression.

      Cheng et ( DOI: 10.1002/ijc.32102) had previously shown the immunosuppression effect of LGALS1 - but this work shows as a proof of concept that the authors approach is a valuable and interesting approach to find immune regulators.

      Response: We fully agree with Reviewer #3. In fact, the immunosuppressive role of LGALS1 has already been described by several research groups (Van Woensel et al, 2017; Verschuere et al, 2014), which indeed proves that our approach is valid. The reference cited by the reviewer was already included in the manuscript, along with other references.

      Major comments:

      In general claims are supported by date - very carefully presented and well characterized data with numbers, stats. It is an interesting descriptive study that illustrates the complexity and diversity of glioblastoma and the induced TME. I just have a few comments or clarifications that I would like to have elucidated:

      • I did not understand why not single cell sequence the original tumor - without in vitro passaging and have the original patient population of MACs/microglia and monocytes sequenced? In other words why sequence the in vitro system-with its inherent caveats of in vitro culturing and not the original tumor? Can you please clarify.

      Response: We agree with Reviewer #3 that our in vitro model does indeed have caveats inherent to patient-derived cell culture models. However, we chose this model to specifically focus on the reciprocal interaction between GBM tumor cells and macrophages in a way that also allows us to investigate how perturbations affect these interactions. This is not possible when using original tumors (e.g. we cannot make KO cells, as we did for LGALS1, and study the effects of genes of interest). (See also the response to the comment of Reviewer #2)

      We do have scRNAseq data from one original tumor sample (LBT123) that is currently being analyzed. Unfortunately, scRNAseq is not available for the other tumor samples. Also, for some of the patients, there is no original material left to use for sequencing. For LBT123, we will compare the scRNAseq data from the original tumor with the in vitro data from the co-culture model.

      • Mac signatures - out of curiosity- authors could not find TNFa and IFN signatures in any population?

      Response: Our analyses did not reveal TNF or IFN as cluster signature genes. However, we did find that TNF expression was slightly higher in MC2, the pro-inflammatory macrophages, although still at low levels. We did not find IFN expression in the macrophage subclusters, but we did find low expression of some IFN receptors. We found a gradient for IFNGR1 with the highest expression in MC3, followed by MC1 and the lowest expression in MC2. IFNGR2 was expressed at slightly higher levels in MC1 compared to the other subclusters. IFNAR1 and IFNAR2 were expressed at comparable low levels in all subclusters. Finally, IFNLR1 expression was higher in MC3 compared to the other two macrophage subclusters. Considering the overall low expression of IFN receptors, we believe that the differences in expression are rather negligible. Furthermore, it has been previously shown that IFN exerts its anti-tumor effect primarily through the responsiveness of endothelial cells and not of myeloid cells, such as macrophages (Kammertoens et al, 2017). Since vascular cells were not present in the co-culture model, low IFN receptor expression is not surprising. We are happy to investigate this in more detail and include it if deemed useful.

      • 8 please show controls side by side with the KO

      Response: We thank Reviewer #3 for this comment. We are not quite sure which panel the reviewer is referring to. If it is panel F, we agree with Reviewer #3 and have changed the order of the bars in the revised version. If it is panel E, the corresponding control images are shown in Figure 5I. Since we believe that these images should not be repeated, we have added a figure reference to Figure 5I in the figure legend of Figure 8, in addition to the figure reference already provided in the text. Furthermore, images of all embryos are presented side by side in Figure S8D-E.

      • Figure 5: if each pair of images are separated and have the legend on top would be easier to *read and follow. *

      Response: We appreciate the comment that the figure should be intuitively easy to read and follow. However, we have chosen a compromise between overview and visibility of details (e.g. morphological features of GAMs). Since this figure already has the maximum width, the images would become smaller if they needed to be separated. Reducing the size would compromise the visibility of important details.

      Significance:

      It is a very interesting study, carefully designed and performed that highlights the heterogeneity of glioblastoma and how GBM can modulate the macrophage population into 3 different subsets. This study constitutes a proof of concept of the combination of and in vitro approach and an in vivo approach to find new players and treatments in glioblastoma. I believe that it would be important and interesting to have a the original tumor sequenced to compare to the in vitro platform and understand how the in vitro selection impacts on the tumor biology and even if it changes the heterogeneity and differential composition of the tumor and macrophage profiles.

      References:

      Abdelfattah N, Kumar P, Wang C, Leu JS, Flynn WF, Gao R, Baskin DS, Pichumani K, Ijare OB, Wood SL, et al (2022) Single-cell analysis of human glioma and immune cells identifies S100A4 as an immunotherapy target. Nat Commun13

      Almstedt E, Rosen E, Gloger M, Stockgard R, Hekmati N, Koltowska K, Krona C & Nelander S (2022) Real-time evaluation of glioblastoma growth in patient-specific zebrafish xenografts. Neuro Oncol 24: 726–738

      Breznik B, Motaln H, Vittori M, Rotter A & Turnšek TL (2017) Mesenchymal stem cells differentially affect the invasion of distinct glioblastoma cell lines. Oncotarget 8: 25482–25499

      Jayakrishnan P, H. Venkat E, M. Ramachandran G, K. Kesavapisharady K, N. Nair S, Bharathan B, Radhakrishnan N & Gopala S (2019) In vitro neurosphere formation correlates with poor survival in glioma. IUBMB Life 71: 244–253

      Chen JWE, Lumibao J, Leary S, Sarkaria JN, Steelman AJ, Gaskins HR & Harley BAC (2020) Crosstalk between microglia and patient-derived glioblastoma cells inhibit invasion in a three-dimensional gelatin hydrogel model. J Neuroinflammation 17

      Chen Q, Han B, Meng X, Duan C, Yang C, Wu Z, Magafurov D, Zhao S, Safin S, Jiang C, et al (2019) Immunogenomic analysis reveals LGALS1 contributes to the immune heterogeneity and immunosuppression in glioma. Int J Cancer145: 517–530

      Coniglio S, Miller I, Symons M & Segall JE (2016) Coculture assays to study macrophage and microglia stimulation of glioblastoma invasion. Journal of Visualized Experiments 2016

      Hamilton L, Astell KR, Velikova G & Sieger D (2016) A zebrafish live imaging model reveals differential responses of microglia toward glioblastoma cells in vivo. Zebrafish 13: 523–534

      Hubert CG, Rivera M, Spangler LC, Wu Q, Mack SC, Prager BC, Couce M, McLendon RE, Sloan AE & Rich JN (2016) A three-dimensional organoid culture system derived from human glioblastomas recapitulates the hypoxic gradients and cancer stem cell heterogeneity of tumors found in vivo. Cancer Res 76: 2465–2477

      Kammertoens T, Friese C, Arina A, Idel C, Briesemeister D, Rothe M, Ivanov A, Szymborska A, Patone G, Kunz S, et al(2017) Tumour ischaemia by interferon-γ resembles physiological blood vessel regression. Nature 545: 98–102

      Li H, Yan X & Ou S (2022) Correlation of the prognostic value of FNDC4 in glioblastoma with macrophage polarization. Cancer Cell Int 22

      Pan H, Xue W, Zhao W & Schachner M (2020) Expression and function of chondroitin 4-sulfate and chondroitin 6-sulfate in human glioma. FASEB Journal 34: 2853–2868

      Pombo Antunes AR, Scheyltjens I, Lodi F, Messiaen J, Antoranz A, Duerinck J, Kancheva D, Martens L, De Vlaminck K, Van Hove H, et al (2021) Single-cell profiling of myeloid cells in glioblastoma across species and disease stage reveals macrophage competition and specialization. Nat Neurosci 24: 595–610

      Póvoa V, Rebelo de Almeida C, Maia-Gil M, Sobral D, Domingues M, Martinez-Lopez M, de Almeida Fuzeta M, Silva C, Grosso AR & Fior R (2021) Innate immune evasion revealed in a colorectal zebrafish xenograft model. Nat Commun12

      Pudelko L, Edwards S, Balan M, Nyqvist D, Al-Saadi J, Dittmer J, Almlöf I, Helleday T & Bräutigam L (2018) An orthotopic glioblastoma animal model suitable for high-throughput screenings. Neuro Oncol 127: 415

      Rampazzo E, Persano L, Pistollato F, Moro E, Frasson C, Porazzi P, Della Puppa A, Bresolin S, Battilana G, Indraccolo S, et al (2013) Wnt activation promotes neuronal differentiation of glioblastoma. Cell Death Dis 4

      Van Woensel M, Mathivet T, Wauthoz N, Rosière R, Garg AD, Agostinis P, Mathieu V, Kiss R, Lefranc F, Boon L, et al(2017) Sensitization of glioblastoma tumor micro-environment to chemo- and immunotherapy by Galectin-1 intranasal knock-down strategy. Sci Rep 7: 1–14

      Vargas-Patron LA, Agudelo-Dueñãs N, Madrid-Wolff J, Venegas JA, González JM, Forero-Shelton M & Akle V (2019) Xenotransplantation of human glioblastoma in zebrafish larvae: in vivo imaging and proliferation assessment. Biol Open 8

      Verschuere T, Toelen J, Maes W, Poirier F, Boon L, Tousseyn T, Mathivet T, Gerhardt H, Mathieu V, Kiss R, et al (2014) Glioma-derived galectin-1 regulates innate and adaptive antitumor immunity. Int J Cancer 134: 873–884

      Vittori M, Breznik B, Hrovat K, Kenig S & Lah TT (2017) RECQ1 helicase silencing decreases the tumour growth rate of U87 glioblastoma cell xenografts in zebrafish embryos. Genes (Basel) 8

      Zhou F, Shi Q, Fan X, Yu R, Wu Z, Wang B, Tian W, Yu T, Pan M, You Y, et al (2022) Diverse macrophages constituted the glioma microenvironment and influenced by PTEN status. Front Immunol 13

    1. Author Response

      Reviewer #1 (Public Review):

      The paper describes a robotic system that can be used for prolonged recording of forced activity in crawling Drosophila larvae. This is mostly intended to be a proof of principle description of a tool potentially useful for the community. The system - whose value lies completely in its reproducibility and adoption - is only superficially described in the paper, but a more detailed description is made available through Github, along with the software used for the collection and analysis of data.

      There is good, convincing evidence this can work as some sort of "larval conveyor belt", used to artificially prolong food crawling behaviour in the animals. More could be said about the ecological implications of the assay (for instance: how relevant is it to an animal's natural behaviour? Does the system introduce artifactual distortions in the analysis, driven by the fact that animals crawl greater distances than they would normally crawl in nature? Will this extensive activity affect their development to pupation or adulthood?).

      In addition all our code being available on GitHub, we have added substantially to Materials and Methods in the manuscript (1-1.5 pages) detailing the analysis pipeline more thoroughly.

      We agree that a more thorough comparison of ecological vs. laboratory conditions was warranted here, and have addressed this in new Discussion section material (6th paragraph especially). The developmental effect due to prolonged locomotion is a very good point – with only a single animal measured for more than 24 hours, we do not yet know whether instar molting or pupation is delayed, but this could certainly be a concern in longer experiments moving forward.

      Reviewer #3 (Public Review):

      "Continuous, long-term crawling behavior characterized by a robotic transport system" by Yu et al. presents their new robotic device to track, reposition, and feed Drosophila larvae as they crawl on an arena. By using a water droplet (or if necessary, suction) to transport larvae from the edge of the arena to the middle, long behavior trajectories can be recorded without losing larvae from the arena or camera field of view. The picker robot is also able to dispense small amounts of apple juice at precise locations to keep larvae alive for extended periods although the food was not sufficient to trigger molting and the development to the next instar stage.

      The approach is interesting, but the authors could provide more details on why the approach is necessary for non-expert readers. For example, what are the advantages of using the robot picker compared to simply confining larvae in a closed arena? It's not obvious (to me) that being picked back to the center of the arena is a smaller perturbation compared to running into a chamber wall and changing direction.

      Thank you for this suggestion, it’s a very good point. We have expanded our Introduction considerably, and directly address this issue (4th paragraph in particular). We do quantify the perturbation due to robot pick-ups and drop-offs (Fig. 3D), but that only addresses the short term. We prefer not to use a closed arena for three reasons: (1) in a gradient navigation experiment, reaching the edge would effectively end “navigation” and we would be unable to study that behavior over longer times, (2) larvae can crawl up the sides of walls and will be lost to the tracker (they do this all the time in the Petri dishes they are raised in), and (3) larvae often do not bounce off walls and resume crawling, they tend to dwell near edges they find. To this last point, we have added a new Supplemental figure (Figure 1 – supplement 1) illustrating this effect with a representative example.

      The first paragraph of the introduction emphasizes the multiple time scales that are relevant for behavior from rapid stimulus response up to developmental times. This is to set the context of the authors' contribution but I'm not sure it's a fair representation of the state of the art. For example, the authors state that high-bandwidth measurement over long times is prohibitive and cite three Drosophila papers, but there are home-cage monitoring systems that allow continuous recording of mouse behavior over long times with high resolution. At the other end of the spectrum, there have been some long-term behaviour experiments done on worm behaviour with reasonably high time resolution (e.g Stern et al. 10.1016/j.cell.2017.10.041).

      This is absolutely correct, the context needed to be much broader than our own prior larva results. We have overhauled that section and written a wider introduction that includes the C. elegans paper you mentioned, and also brings in other model systems like adult flies, mice, and rats. We frame our own work as (1) in a new animal, for long term measurements; (2) investigating non-confined free locomotion over a long time scale.

      The authors train a neural network to segment and track the larvae, however, little information is given on the training process and I don't think it would be possible to reproduce the model based on the description. More details of the network, hyperparameters, and training data would be required to evaluate it.

      Definitely! We have added a new section to Materials and Methods (1-1.5 pages in length), detailing our analysis pipeline, with sections for position tracking, postural analysis, and behavioral classification.

      The authors also state several times that larval identity is maintained throughout the recording, but this isn't quantified. It's not clear whether identity is maintained across collisions of two or more animals by the tracking algorithm or whether these collisions simply don't happen in their data because density is low.

      This has also been addressed and clarified in the same new part of the Materials and Methods section. We quantify collision rates and give the accuracy maintaining identity after collisions.

      The environment is nominally isotropic, but once larvae have been crawling on the surface for hours, including periodic feeding, there will likely be multiple gradients the larvae may sense. This may not be observable in the data, but should perhaps be mentioned in the text.

      This is certainly true. Other than the single animal 30-hour experiment described in the manuscript, there is no food introduced to the larvae during our 6-hour experiments. Looking ahead, the presence of food remnants in the arena could become a serious confounding factor in nominally isotropic experiments, as the reviewer points out. We have added substantially to the Discussion section to discuss various limitations of the design and experiments, and directly talk about the odor/taste stimuli being introduced by food (second to last paragraph in Discussion).

      The authors show that the picking action results in a small but detectable increase in speed. The degree of perturbation overall depends on the picking frequency so some quantification of the inter-pick time interval would help to interpret whether this perturbation is relevant for a particular experiment. Is there a difference in excitation when larvae are picked successfully on the first try compared to when multiple tries or suction are required?

      We have now quantified the amount of time between pickups and added that in the Materials and Methods section directly (it’s 0.87 pick-ups per hour per animal). We do not have a sufficient amount of data to determine whether there is a statistically significant difference in behavior for multiple pickup attempts – this can also be confounded because sometimes an unsuccessful pickup is one that does not touch the larva at all (so would presumably not introduce additional perturbations).

      From the reconstructed trajectory in Figure 4, this interval looks very long compared to speed increase after picking. When reconstructing the trajectory, how are the segments joined? Is it simply by resetting the xy position or also updating rotating to match the previous direction of travel? (I'm guessing the larva can rotate during transport?)

      We have updated the Figure 4 caption to make it clear that the segments are only joined translationally, by resetting the xy position.

      The authors present a simple model in Figure 6 to illustrate the differences between individuals that can be hidden when looking at population distributions. However, the differences they show in the simulation don't seem relevant to the differences they observe in the experiments. Specifically, Fig. 6A and B show a contrast between individuals with similar mean speeds compared to individuals with different (but still unimodal) mean speeds. In contrast, the experimental data in Fig. D shows individual distributions that are quite similar but that are bimodal. So, there is indeed a difference between the individual distributions that is obscured in the population distribution, but is there evidence of larval personality types (line 444)? Similarly, the sentence beginning line 381 doesn't seem right either.

      We are really glad this was brought up so that we could clarify better in the text, as it’s an important point. We have edited the text in the Results subsection related to Figure 6 and the Figure 6 caption to clear things up. The individual distributions in 6D are not bimodal, there are 38 traces shown that are all essentially unimodal. In addition to stating this directly in the text, we have quantified this by adding the average BC for individuals in both isotropic and thermal gradient contexts (they are essentially the same, i.e. equally unimodal in both cases).

    1. Author Response

      Reviewer #1 Public Review:

      1) “…The authors make reasonable assertions, but all of these need to be validated by electrophysiological studies before they can be treated as fact. Instead, they should be treated as predictions. For example, in the conclusions from the model section, that endbulb size does not strictly predict synaptic efficacy should be modified from an assertion to a prediction.”

      The reviewer makes an important point. We realize that, despite describing the data as the output of a model, we needed to be clearer that the model output is in fact a set of predictions to be tested experimentally. In the reorganization of the results, we collect the model output explicitly in a section named “Model Predictions”, and list five classes of predictions that describe explorations of bushy cells. The fifth set of predictions was previously a separate section but should now be better appreciated as conveying hypotheses since it is incorporated into this newly named section. Please note that the hypotheses are constrained to varying extents by the high-resolution structural data we present, such as the estimation of synaptic weights from the counts of synapses. The compartmental models for each bushy cell also are constrained by the structural data and published biophysical and electrophysiological properties of the cells. The pipeline to create the models is described in its own section now using that terminology: “A pipeline for translating high-resolution neuron segmentation into compartmental models consistent with in vitro and in vivo data.”, which we hope conveys the notion that the modeling framework is indeed a template that can be applied to future experimental data. We explicitly make this latter point in the new Discussion section “Toward a complete computational model for globular bushy cells: strengths and limitations”.

      Reviewer #2 Public Review:

      1) …” While this is technically impressive (in regards to both the structure and modelling) there are significant weaknesses because this integration makes massive assumptions and lacks a means of validation; for example, by checking that the results of the structural modelling recapitulate the single-cell physiology of the neuron(s) under study. This would require the integration of in vivo recorded data, which would not be possible (unless combined with a third high throughput method such as calcium imaging) and is well beyond the present study.

      We appreciate the support for our approach, and we now make explicit in the manuscript that the output of the models should be interpreted as predictions for eventual experimental testing. We also consider in the Discussion some experimental procedures that might be used to test the predictions. Ca2+ imaging is currently too slow a reporter for the rapid synaptic events and integration time constant for bushy cells, as the reviewer knows, and we think (and present in the Discussion, section 2) that focal optical stimulation simultaneous with recording from fast voltage sensors are potential avenues to achieve this goal.

      2) The authors need to be more open about the limitations of their observations and their interpretations and focus on the key conclusions that they can glean from this impressive data set.

      As indicated in response to a similar comment from Reviewer 1, we have collected and discuss the primary limitations in a new section within the Discussion, entitled “Toward a complete computational model for globular bushy cells: strengths and limitations”.

      3) The manuscript would be considerably improved by re-writing to focus the science on the most important results and provide clear declarations of limitations in interpretation.

      We have extensively re-organized and re-written the text to highlight the key structural observations (Figures 1-3, 7-8), the pipeline from structure to model (Figure 4) and interleave structural observations with the outputs of the model (Figures 5-6, 8). The latter are explicitly detailed in a new section called “Model Predictions”. These predictions are organized into five classes. We think that this new organization will improve communication of the key results, and further highlights the key discoveries from structural analysis and predicted functional mechanisms as explored in the compartmental models.

      Reviewer #3 Public Review:

      1) The authors extract here from the longer introductory commentary a one-sentence summary of the strengths of the manuscript, and thereafter focus on the weaknesses, since this document emphasizes our response to those critiques. To quote reviewer #3: “The strengths of this paper are that the authors obtained unprecedented high-resolution 3-D images of the AN-bushy cell circuit, and they implemented a biophysical model to simulate the neural processing of AN inputs based on these structural data. … The biophysical modeling, although lacking comparison with in vivo physiological data due to the chosen species (mice), is also solid and well documented.”

      We appreciate that the reviewer acknowledges the attention to detail that entered into the nanoscale imaging, cell reconstructions, building the modeling pipeline and constructing the compartmental models.

      2) Despite the high quality of the data, the paper is marred by the species they chose: there are very few published in vivo single-unit results from mouse bushy cells, so it is hard to evaluate how well the model predictions fit the real-world data, and how the structural findings address the “fundamental questions” in physiology. … No rationale (e.g. use of molecular tools or in vitro physiology) is given why the authors focus on the mouse. It seems that the analyses provided here could as well have done on a species with good low-frequency hearing, which may have provided a much more interesting case for understanding the spectacular temporal transformation performed by bushy cells.

      We now report our reasons, in the first paragraph of the Results, for selecting the mouse. One reason for choosing mouse was that biophysical properties of bushy cells, which were important parameters to constrain the compartmental models, were collected from mice. These data are collected from dissociated cells and from brain slices, and these experiments continue to be more tractable in mice. The second reason is that mice are used in nanoscale and light microscopy connectomic studies because their neurons, cell groups and entire brain are smaller, so that a given volume of imaged brain will contain more cellular elements. These other connectomic studies provide a template for eventual comparisons among brain regions. Our overall goal is to image the entire cochlear nucleus, and the size of the mouse brain makes this goal tractable given current technology. Indeed, we are currently analyzing an image volume of the more rostral ventral cochlear nucleus that is about 5x larger than this image volume and collected with a much better signal to noise ratio. The third reason for choosing mouse was so that the current project could be augmented by genetic tools to further classify cochlear nucleus (CN) neurons and their extrinsic inputs, and potentially manipulate neural circuits in future studies. For example, the atoh7 (math5) and hhip gene products are markers for subsets of bushy cells, suggesting the presence of molecular subtypes of this cell class (Jing et al. 2023).

      3) If we look at data from other animals such as cats and gerbils, it is true that high-frequency (globular) bushy cells show envelope phase locking, but compared to ANs they are at best only moderately enhanced (gerbils: Frisina et al. 1990: Fig 7 and 10; cats: Joris and Yin 1998 Fig 4); the most prominent enhancement is actually to the temporal fine structures of low-frequency bushy cells (cells tuned to < 1 kHz), which mice lack. Furthermore, the temporal modulation transfer function (tMTF, i.e. the vector strengths vs modulation frequency plots in Fig 7O of the paper) of (globular) bushy cells are mostly low-pass filtered, with a cutoff frequency close to 1 kHz, and the highest vector strength rarely surpasses 0.9 (cats: Rhode 1994 Fig 9, 16, Rhode 2008 Fig 8G, Joris and Yin 1998 Fig 7; and there's one report from mice: Kopp-Scheinpflug et al 2003 Fig 8). Thus, the band-pass tMTFs tuned to 100-200 Hz with vector strengths > 0.9 or 0.95 in this paper (Fig 7O, Fig 8M) do not really match known physiology (in non-mouse species). Again, we know very little about in vivo physiology of mouse (globular) bushy cells and there is of course a possibility that responses in mice may be closer to the predictions of this paper.

      We agree that there are (unfortunately) few studies in mouse that can be compared with our simulations. With regard to the tMTFs, we can make a couple of points. First, we note that the stimulus used for all the panels except P2 in Figure 6 (previous Figure 7) were at 15 dB SPL, which is the level where maximal envelope phase-locking occurs in the low-threshold ANF inputs. This choice was based on previous experimental work that examined the intensity dependence for SAM stimuli in the auditory nerve (Smith and Brachman, 1980; Joris and Yin, 1992; Cooper et al, 1993; Dreyer and Delgutte, 2006, Figure 2B, Figure 3). Second, Figure 6, Supplemental Figure 1 confirms the behavior of the auditory nerve model used for input to the bushy cells (Rudnicki and Hemmert (2017) implementation), replicating Zilany et al., 2009, Figure 13D. These results show that phase-locking decreases at higher intensities as expected from the experimental work. Relevant to this topic, the lone report of responses to SAM stimuli in mice (Kopp-Scheinpflug et al. 2003) used 100% SAM at CF at 80 dB SPL. At this high intensity, it is expected that the envelope phase locking at CF will be less than at lower intensities because of rate saturation in the high and medium spontaneous rate ANFs (Carney, JARO 2019; Joris and Yin, 1998). In guinea pig, envelope phase locking is greater in low-SR fibers at 80 dB SPL than in medium and high SR fibers, but it is still lower than at its peak at about 50 dB SPL (Cooper et al., 1993). All of these experimental observations therefore lead to the prediction that the SAM envelope locking in Kopp-Scheinpflug et al. (2003) should be lower than in our simulations.

      In addition, Kopp-Scheinpflug et al. (2003) did not report which VCN cell populations cells were recorded. If the recorded cells were a heterogenous mixture of bushy and multipolar cells, then their data are not directly comparable to our model predictions. The stimulus intensity also needs to be considered for comparison with the work of Rhode (1994), whose lowest stimulus level is 30 dB SPL (Figure 9), and who also used a different stimulus, 200% SAM, and with the work of Frisina et al. (1990), who used 50 dB SPL. Interestingly, Figure 14D in Rhode (1994) shows a synchrony coefficient ranging from 0.5 to 0.9 at 30 dB SPL at 300 Hz modulation, which is similar to what we predict in Figure 6P2. We also remind the reviewer that our simulations did not include the effects of feed-back inhibition at CF (Caspary and Palombi, 1994; Campagnola and Manis, 2014; Xie and Manis, 2014, Keine et al. eLife 2016), which may affect phase synchrony in complex ways (Gai and Carney, 2008). One important feedback pathways arises from the tuberculoventral cells of the DCN (Wickesberg and Oertel, 1991; Campagnola and Manis, 2014), but the envelope synchrony behavior of those cells is not known.

      Thus, we now emphasize in the revised manuscript (in the Discussion) considerations of stimulus intensity used across published studies, citing the works above, the relatively high vector strengths at low modulation frequency, and that these simulation results are currently predictive. The simulations are also limited in that we used only one configuration of ANF inputs (low-threshold, high SR). This ANF SR category was selected to be consistent with the suggestion by Liberman (1991) that the globular BCs receive input principally from the low-threshold high-SR fibers. Mixtures of input SR classes would be expected to change the envelope representation at higher intensities. Finally, the parameter space is quite large (intensity x frequency x [ANF distributions], x inhibition) and is better explored in a separate study once we are able to provide better or additional constraints to the modeling framework. Also, to put the selection of SAM stimuli in context, we indicate that mice can encode temporal fine structure although only as low at 1 kHz, but at similar VS to larger rodents such as guinea pig (Taberner and Liberman 2005; Palmer and Russell 1986).

      Reviewer 4: Public comments

      1) The authors have collected an impressive array of physiological data and provided some beautiful 3D images of SBCs with dendrites. These are clearly strengths. The computational models for mechanisms of SBC responses, however, are made to fit what may be inadequate anatomical data. Instead of conclusions, perhaps they need to reword their discussions to refer to the anatomy as hypothetical substrates.

      It is true that the SBEM image volumes have strengths and limitations. We now collect these considerations in the second section of the Discussion, “Toward a complete computational model for globular bushy cells: strengths and limitations”. One limitation of this volume is that we do not have sufficient resolution to categorize synaptic vesicles by shape and must infer their excitatory or inhibitory nature. Note that tracing inputs to a source neuron, such as tracing the endbulbs to parent auditory nerve fibers, solves this problem, but the smaller terminals remain problematic in this regard. The goal is to not only assign excitatory or inhibitory phenotype, but also a cell type of origin, so that actual spike patterns, evoked by sound, can be provided as inputs to the model. The compartmental model is detailed, and amenable to mapping this information from other experiments as it becomes available. Nanoscale imaging does provide detailed structural information in terms of surface areas, volumes and process diameters that is important in constraining the compartmental models, and that is not attainable by standard light microscopy approaches. These points are now made in the Results and in the Discussion, as mentioned earlier in this paragraph. And, as indicated in the responses to other reviewers, we highlight the model outputs as predictions to be tested experimentally.